Posted July 20, 2013on:
For hand collected /crafted data sets, as with manually collected datasets from survey data and other data sets where the experience of the person keying in the data on a digital device, is short of an acceptable standard – the distance conventional metrics will fail. This is because these distance metrics “assume” that the data representation in the data sets proxies for the intent of the end-user, which is not true, when intentional errors creep us.
This requires a way to model for the unintentional errors that have creeped up since, and requires to model for the very structure of the input keyboard that is used to key in the data. In this algorithm, I explicitly model the QWERTY keyboard as a graphical entity, where I model the 26 alphabets of the English language as the nodes of a graph & the distance between the nodes accounting for the
Scope : This algorithm, will give the similar entities amongst many in a hand-crafted data sets. For example, for two entities, “Weka” and “Wkea” it will identify them as one entity, and with a spell checker in the final disambiguating pass, also identify which of the two entities is the actual entity .
The limitation of this apparatus , though is for non-identified/no-auto suggested words in the English dictionary , such as people names like “Ekta “ and “Keta” – in which case though it will identify and mark the two entities as the same, after having “identified the typo” – it will be limited in its scope to mention which one of them is actually the right representation.
In that sense, this application is the intent prediction in a global scope, but NOT the identification of which of the singular entities is the actual representation.
The QWERTY representation
Algorithm & data structure representation :
- The three rows of the English alphabets in the QWERTY keyboard are represented as nodes of the graph, with a distance between the adjacent nodes as 1 . The adjacent nodes in turn are picked up, accounting for the fact that typographical errors are more likely to occur in a neighboring zone. For example, according to the schema above , Q is adjacent to Q,W,A and S – the adjacency being in turn represented by the neighborhood of that node. For example, for D as the key, W,E,R,S,F,X,C,V are its immediate neighborhood.
- The concept of the neighborhood as above, also narrows down the what keys are acceptable typos’ in the immediate neighborhood, so that the distance metric does not penalize the user
- Extension of the adjacent nodes is the concept where the users inadvertently swap the left and right keys . Like instead of typing “weka” the user types “wkea” , since the fingers are not user’s synched up while typing. This can be due to inexperience in typing, or while pure errors while speed-typing (typical of surveys, call center chat data etc.) – in this scope the comparison, instead of being with the neighborhood/adjacent nodes as above, is against the left vs. right portion of the QWERTY keyboard- with defined positions for what is left and right . This ensures that when two consecutive characters are deemed as “typos emanating from left and right hand asynchronization” – these characters indeed come from left and right part of the keyboard.
- As the next step, the threshold of what is acceptable difference between the words is defined. Currently this is set as one-third of the absolute difference in the length of the words being compared. This is because the difference in the lengths of the words being compared has to be within a defined length . for example a user instead of typing “asynchronous” may type “asynchronus” (missing the “o” after “n”) – then given the threshold restrictions, the distance metric, should uniquely attribute this to asynchronous. (In this case, it will happen in the 1st step of auto-suggest itself, there by narrowing the scope – or reinforcing the uniqueness of the attribute)
- If the difference in the length of the words exceeds the threshold then the algorithm places a higer belief that the words are more likely to be different and uses the Levenshtein distance metric instead.
- As a final step to identify which of the two keys is the rightful key (“Lengt” and “length” are both mapped to one entity, but the algorithm till step 4 does not know which is the right “word”) , the word being compared is passed over an “auto-suggest” – which looks up predefined words in the English dictionary and outputs the candidate(s) with the lowest distance metric.
As a proposed extension to this algorithm – I intend try out different dictionary approaches for auto-suggest (together with Apache Solr and elastic search)
I used Python’s networkx to model the QWERTY keyboard, and built the QWERTY distance metric on top of it.
As in other distance metrics, this distance metric computes (n-1)comparisons for each word , thereby outputting the word with the closest distance, O(n2) complexity. Post this the auto-suggest features narrows down the rightful- scope by candidate key(s). Overall the complexity being O(n2) .
Performance metrics & Benchmark tests
The way to measure the performance of this algorithm is by passing it through the corpus of a hand-collected data set and compare its performance (time complexity and results against other distance metrics – like pure Levenshtein distance application, Manhattan & Euclidean distance, thought by a rule of thumb Levenshtein will do better on pure distance search among-st the other two (Manhattan & Euclidean distance)
While modeling this problem, I also found a version of Fuzzy string match as in here – http://search.cpan.org/~krburton/String-KeyboardDistance-1.01/KeyboardDistance.pm and I will contrast the performance results against this(pending)
Suggestions, critique is as always welcome.
Stay Green & Growing,
Tailored from my original post At Grace Hopper’s women in Computing, 2012 here .
Network, Network, Network – and do it right !
Networking is not “Foreplay” – and it isn’t creating Black holes either ..
And here’s why – :
Women haven’t traditionally had the benefit of “smoke-time” networking that seems to work so well for men (nor any womanized version of it). That said – surely there must be ways that amazing superwomen around the world are making it into the corner offices.
As a society, we have been forced to focus on getting business cards rather than building relationships that can be leveraged practically. This is equivalent of building black holes around us, in an attempt that someday we will find matter that will fill the hole.
The only problem with this approach is two fold: one, you are selling a bad product, and two, your self-promotional pitch was so unmemorable that the speaker decided not to set aside some processing power in her/his brain to remember you. No one remembers someone with an inflated sense of self-importance. There is a classical statistic from speed dating, which found that the men who hardly spoke but genuinely listened ended up getting dates more often than the other men who focused on speaking for a greater fraction of the speed dating.
At another end of the spectrum, I observe, often to my utter dismay, the fawning or servile approach. Rather than feeding their own self, the focus reverses to the person whom they deem influential enough. It may not be a conscious effort, and may simply stem from the lack of awareness of one’s real do-ablity. Specifically in Eastern cultures, including India, people are maneuvered to cushion their interests, to the point of obscurity, which is why sometimes the information and opportunities passes them by, and they don’t even take a notice. So, if you have a strong point, tie it together without taking the detour.
I said earlier that the focus should be on forging mutually befitting relationships, and here’s why: when you focus on creating black holes, as I call them, you are missing out on accidental benefits – amazing support systems, sponsors who would endorse you or vouch for you, mentoring opportunities, or just plain unbiased viewpoints – when you need it. So share something phenomenal that people will remember: spin relevant stories, be inquisitive and keep re-learning, and most of all, stay true to who you really are.
Differentiate the position from the person, be assertive, don’t downplay yourself, and yet don’t oversell. It is a classical balance problem.
“ Network” , and network with an end-goal in mind, which is two fold : one, to create spheres of influence, and two, to create information advantage” .– Saundarya Rajesh, AVTAR
Four quick thoughts that sum this up:
1. Focus on the end goal and then work backward: Accidents -> acquaintances ->associates ->advocates -> allies. The point is simple if you want to be moving from “foreplay” conversations to forging lasting relationships for mutually befitting relationships you will invariably have to focus on quality.
2. Measure it: If you are not moving from acquaintances to allies over a period of time, the networks don’t serve a deeper meaning. You see, input alone doesn’t count, what you made of it does. The “measuring” bit doesn‘t have to be as flashy as excel but should suffice to give you a reinforcing feedback loop on your overall interpersonal and communication skills.
3. Have a compelling elevator pitch: An elevator pitch is something that gives a compelling introduction to who you are. Craft one, and try to improvise depending on the listener and her/his background and Intellectual giftedness. Of course this means that you also change dynamically depending on the sophistication that the listener might have about your profession.
4. Demonstrate equality: You can’t have an enriching conversation when you do not consider yourself to be one among equals. Equality begets equality, and it begins with awareness of who you are and what you stand for. Reboot your operating system and get moving – and yes, practice, deliver and then practice again.
The theory of 10,000 years from different schools of thought says that deliberate repetition is a key. So go develop a lens for the world you want to live in, and grow – and then just reverse engineer what you want to achieve of any relationship that could help you towards it and work your way through it. Of course, it assumes you focus on forging a mutually beneficial relationship and not just trying to get around into a parasitic relationship.
Guess what, we all like to push compelling and competitive candidatures.
Now, Go Rock!
Also published here - http://www.flexicareersindia.com/newsletter/jan2013/newsletter.htm
The larger Question and the meat of it thereof : Wouldn’t the user pays subscription model work as well for Quora ?
Here’s what I mean – As a user , I navigate about 10-20 questions in Quora every time I take out “time” I am willing to engage myself in. And then I go back and start to work on something, hit discussions in a multitude of channels and I forget – and that is when I want to refer to the same fact – in the organized way I first saw it in its context.
I think, that is part of the opportunity for Quora. As Achilleas Vortselas mentioned (at Quora) , “Quora as middleman”, I envision the next layer of structuring and personalization to build on top of the collective knowledge of the community.
I want to (and I hope that other users want to, as well) organize this information in a useful way. And that could be highlighting the pages, that are only “specific to me” , like individual page reads. Similarly, as a user I would be willing to pay for premium services like organizing my knowledge and contacts, and since it is personal, non-obtrusive to any one else. It’s my personal learning dashboard, that I can cross refer years from now, since I find information in a tier based fashion. One, by meta tags AND/OR (Quora) Boards, and two, by my personal notes.
[Hypothesis: Since we are building social capital here, the content management, will take care of itself. ]
Challenge & opportunity : Information retrieval from the end user’s perspective.
[The Public Good...]
With literally tons of CEO’s and Executives engaging in this model – subscription should be able to make money. May be, in the future, Quora would make part of its revenue from firms funding their top employees, that is the future of Crowd Sourcing Revolution that we are beginning to see.
The future of this will be – the firms sponsoring employees/ committing resources to work on abstract problems and new concepts and “Grooming Aggregators” , since it will be far cheaper than the Home-Grown Talent model , difficult to source, retain and keep engaged all round the project.
In some sense, it is how Rating agencies make money. Everyone else pools in, because everyone else does, and that is the social good at its BEST.
[Hypothesis: Knowledge workers have always and will continue to pay for tools that make them more efficient ]
Challenge & opportunity : Of course that needs high computing and read-writes to store the user specific Information,and engineering a solution that can scale as well. Which is where the guiding rule of making money comes in. Make money roughly equal to (Revenue-Cost of engineering the solution), and should be positive.
So my Quick points here -
User pays subscription model
a. User specific page read/write(s) and features to organize and personalise the information retrieval for later use.
b. Putting structure in YOUR personal Network – listing and saving personal content/notes for lookup to organize the contacts from mere “followers and people who follow” paradigm .
c. The Public good : Rise of Crowd Sourcing and Social capital.
Business opportunity : Innovating on the user specific needs(Behavioral & Business needs) and tailoring subscription plan that match these needs .
PPS: Don’t pick me on words Make money is dirty rule, but that’s what we are taking about here .
This article is a part of my original piece at Quora here . Cosmetics and Technical updates/stack probe in progress.
(Dear God, please tell my Mom I am doing very fine, and I still think that studying Quantitative Economics, and flying over 16 countries all the way was worth this Education)
Last week I was on my longest flight so far from the far west to the far east, and on my way back I bumped into a Gentleman who was coming back from Mongolia (the capital , Ulan Bator, to be precise). Talking with him made this trip one of my most shortest trip ever. He was into Marketing side of Construction and myself a bottom up economist and auto-didactic. So what could we talk about ?
I had no clue of Construction, or Marketing in construction business or even Mongolia . All I knew about Mongolia was that they have huge deposits of coal . Some people might hit a stalemate at that, what can you really talk, while I was like – tell me more about it !
We talked about the Mongolian empire, how the Mongols built the Monasteries all the way to Hungary(We could not agree whether the Mongols came up-to Hungary , but that is for some other time), it’s comparison with the Romans and the influence of Mongols on food. And seamlessly, we moved to the Chinese dynasties the Han empire, the Qing dynasty and how the emperors had tried to bring Buddhism to bind people together , and then we moved to the dispute over Tibet and Taiwan , and what I had learnt in the Harvard Project for Asian & International relations, in Taipei , Taiwan, that I was just coming from.
Come to think of it – it was a powerful conversation(and I really learnt a lot). Now that is the thing, finding CPI , or the common point of interest sets you apart from the rest. And come to think of it, it’s really an art. For once he did not tell me what a superwoman I was ( I like telling myself that I am one) – but I instantly knew that he enjoyed talking to me, as much as I did.
This being able to talk to people “about them” has been my journey of the last 4-5 years. I have practiced and matured it on CEO**’s and Prof’s – but that is not to brag- the point is about learning.
So my quick take-aways -
1. People are always willing to “Teach” you, once they see the intellect + energy + enthusiasm in you to learn .This is how we grow, and is a more sustainable way of learning rather than feeding some decaying facts into your brain.
2. You never choose your mentor. Your mentor chooses you. Point. (Borrowed from Indra Nooyi)
3. Always push meritocratic people, even if you have to go out of THE way to get them some limelight, and then just keep passing the ball.
4. You don’t ask, you don’t get. Under –rated, over-said, but true.
5. No matter what you want to learn or know, stop the foreplay, and get to the CPI , quick enough to hold them.
6. What goes around comes around – Gratitude can not be taught, practice it.
7. There is no single character that can beat your Genuineness – you will know it when it has arrived to you.
8. Push & stretch yourself, and then stretch yourself a little more.
9. You are beautiful, because of your words and your self awareness and that is something that sets you apart and a Brand “YOU” in this dynamic very competitive world.
10 . We are more than the organizations , and schools we represent – and in some way the hierarchies of power are shifting bottom up . How YOU speak really defines your School, your community and your organisation and your people.
11. NEVER stop learning.
12. Everything around you, has always been there, but when you meet people who have been there, done that, it sets a context that you can build on, like things, people, places, empires have suddenly started to exist in your conscience. Beg, Borrow, steal, but GET that context right in your head, you will amazed at how much you can really learn, after-all .
13. Stop being mediocre, and stop when you have made your point. And, hold your drink, if you don’t have that super take-away :)
(Emerging ideas from my book – 2o’s is the new 30’s , planning the dots, you will want to connect backward, Dedicated to Annie Fan, a superwoman who knows how to listen, and listen well)
** John Kearon, CEO, Brainjuicer that even shaped 4 months of my work – My Master’s Thesis – On how we think we think, is not how we really think . In some ways, that is THE bit about connecting the dots.
Traditionally, the goal of e-governance projects across the world has been broadly, twofold – ensuring a faster service time to address the pain points and lowering the cost of delivering the public services. However, as the new technologies such as Social media and Big data emerge, Governments should re-think of e-governance not merely as technology centric – but as a tool for participatory public policy.
The possibilities that social media has opened up for governments are many. For starters consider the GovLab, an initiative in USA which works with senior government executives and thought leaders from across the globe – and runs controlled experiments for a better public policy, while also helping reduce the costs of providing services. Or consider @sweden initiative by the government of Sweden, which along with an advertising company runs a twitter account controlled by ordinary citizens for seven days, on a roll basis. This project aims at better governance, engagement of its citizens by amplifying their voice in a transparent manner- while also supporting the tourism industry in Sweden.
The other end of the spectrum are projects aimed at reinforcing good citizenship based on behavioural economics – across health, reducing fecal pollution by dogs by inducing the dog-owners to clean (Taiwan) , and creating incentives to ask for receipt and beat corruption in tax compliance (mainland China). These are experiments the rest of the world is exploring top change behaviours that stick.
In the Indian context – now consider delivering mobile health care, where social media listening tools can offer countless opportunities to help deploy resources in an optimized manner – allowing for an efficient delivery to the “last mile”, and developing key infrastructure and utilities in the currently underserved districts. Or, consider supplementing the RBI’s latest e-governance application and online tracking system in its foreign exchange department with social media initiatives such as those developed by Clemson University, USA– that used social media listening tools to predict the direction of stock market and foreign exchange. Having similar projects can help triangulate the consumer and Foreign investor sentiment helping the central banks and governments to handle pressing monetary policy issues in a dynamic manner. The benefits are manifold - allowing for a better planning and attracting the fleeting FDI by signalling the trends in consumer market thereby re-enforcing that India is indeed a key market from where future growth will emerge.
Or look at the Aviation, FMCG or telecommunication sector – by bringing symmetry to the information in the markets – we can supplement existing institutions like the Competition Commission of India, CCI to extract useful information on consumer surplus to help decide on how best to allow the competitive landscape emerge, and help the businesses grow. By supplementing their insight with an additional triangulation of relevant and contemporaneous data, this will allow us to do what is actually in the consumer interest, rather than what “they” think it is. This is the helm of participatory governance.
Bringing such innovative disruptive projects will also help entice top talents into public sector and help revive the competitive landscape by restricting brain drain to the much coveted private sector. By using wisdom of crowds and crowd sourcing its problems – this will create a rich vibrant pool of ideas in a fair meritocratic manner, while keeping the overall costs of the project considerably low. We opened up our markets for the world in 1991, and now it is time to open the governance for our citizens and best brains, by making it relevant. Of course like all the sectors, in its bare bones, it has challenges manifold. The first is a cultivating the mindset – but the good news is the developing trend of today’s dynamic youth to lead parallel careers and consultative projects for the social good. Then there is a disheartening mere 10.2% internet penetration in India, the need for regulating the “social listening” if at such pilot programs are launched . Yet against all of this, we together, are more resourceful than the resources that constraint us.
In conclusion, amplifying the voice of the citizens in a transparent manner can help the governments do a cost benefit analysis of which public goods to develop, allowing for better five year plan – essentially outsourcing it’s public policy and developing a truly meritocratic governance. This in true terms is, the largest (democracy) – for the people, of the people and by the people.
The West is changing its traditional structures, and shifting the democracies bottom up – it’s time we caught up, too. Incredible India, after all.
Posted July 13, 2012on:
[In hindsight, everything falls in place, everything.. ]
No matter how good I perform, I am never satisfied with my performance in the “Blue Book”. One reason for this, is that my handwriting has always been so bad, that if asked to read it myself- it would take me an hour to decipher my own letters. (Thankfully I now type all my assignments.)
So taking you back in time to 2006, and one of this, “Blue Book Blues Chapter” . As always I was not happy with how I had performed. And I found myself sitting before the head of my department, Prof. Nitin V Pujari and in tears for a mark- to which he asked- “It is just one mark, does it really matter enough to weep?”
Sulking and clearing my chocked throat, I replied, “Yes Sir, because I have worked for this extra mark.” He not only did give me that extra mark, but also told me something that I will never forget, and it all comes as Déjà vu to me, even to this day.
“Tomorrow you will graduate and today you are more than willing to work (in Industry). But day after tomorrow you will want to come back and go back to study again. That is life, but you will not understand it today.”
Like everyone else, while at college I always considered myself to be a hard working student. But life that it really is, I seldom ever looked at opportunities to grow beyond as a person, to collaborate with network of researchers, or passionate fellow students who were then building software to remotely manage their Personal computer’s , or rewiring the bus(es) in their computer with their new found knowledge of Integrated circuits. As for my version of 2006, I never understood how to apply my new found knowledge, one that consumed 4 years of my life, and which I got by securing an All India rank of 188, something I am proud of, to this day.
So we know there is a Gap, with something amiss, “something” we expect “somebody” to give us “somewhere”. Our parents and often we ourselves come to think of this “something” as “Education”, “somebody” as “Teachers”, and “somewhere” to be the “College”. The question that then comes is three-fold – One, How to apply and Two, how to make most of your Education and Three, How did I change, and How can you evolve, too ? (Hopefully for better) I will try to answer all these questions sequentially, borrowing ideas from my forthcoming book.
The reason we don’t apply is because we either do not know, or that we focus too much energy on escaping the rat race, rather than building our skill sets and grooming ourselves. Anyone who understands the basic Demand-Supply , can intuitively understand that if there is a high supply of a skill, a neck to neck competition will follow towards supplying the limited demand.
Unless you really know what you want to do with your life - when you leave the college, you will really be Generalists, certified to be moulded into organizations. And , as the organizations start to demand plug and play workers – this will mean that you will be judged in your respective jobs on what you bring to the table and this will eventually impact how to shape your career and how soon you grow both personally and in your careers.
How to make most of your education – really is much personalized blue print, that I wish all students to develop while still at college. In essence this links to the “Application of ideas“ I mentioned above, and would require expanding the “eco-system of your skill sets” in the direction of where you really want to grow. This means to plan your dots backward, you will have to CREATE those dots in the first place.
And lastly, How do you start – well, knows no one answer, in fact there is no one single answer. It is about stretching yourself and then stretching a little more – for willpower is like muscle that has to be exercised every single time you are faced with something you think you can remotely achieve. The organizations and their needs are changing, and with that a you have to evolve too, and fast enough – and most importantly, continue to invest in yourself not for a job, but for a life-long process of learning and growing as a person.
When you really look at life, it is just a series of actions that we consciously choose to do that shape our perceptions, and thought processes – and in these formative years it is this thought process that decide how soon will you get there. Your mind, can achieve truly miraculous feats, if you let it. It is learning the difference between information, which is just rapidly aging facts –and to that truly learn we must make sense of this information and use it to build knowledge .
When I was a Bachelors student, I had never imagined I would change lanes completely from Computer Science to Quantitative Economics, my current Masters, which I shortly finish. In between this time, I have travelled 6 countries across 2 continents, failed another exam, and rebuilt myself all over again in a completely new culture and language , I knew nothing about. I have found in myself the courage to shut up that voice that tells me why I can’t do what I set to do, and to re-evolve and compete with my international peers in a Ph.D. program, when I did not even have a Bachelors Education in Economics.
I have matured as person, learnt how our inactions count as much as our actions and most importantly - I have learnt how to listen and express gratitude for all that I have. And for who I am today, I have to thank my teachers, for telling me all this even when I was I was not receptive, for otherwise, I would have missed this boat ; I now call life, altogether.