Protected: Don’t be so focussed on the greener grass on other side, that you forget to water your own plants – aka being a superwomen
Posted October 8, 2015on:
- In: Uncategorized
- Enter your password to view comments.
Posted July 20, 2013on:
For hand collected /crafted data sets, as with manually collected datasets from survey data and other data sets where the experience of the person keying in the data on a digital device, is short of an acceptable standard – the distance conventional metrics will fail. This is because these distance metrics “assume” that the data representation in the data sets proxies for the intent of the end-user, which is not true, when intentional errors creep us.
This requires a way to model for the unintentional errors that have creeped up since, and requires to model for the very structure of the input keyboard that is used to key in the data. In this algorithm, I explicitly model the QWERTY keyboard as a graphical entity, where I model the 26 alphabets of the English language as the nodes of a graph & the distance between the nodes accounting for the
Scope : This algorithm, will give the similar entities amongst many in a hand-crafted data sets. For example, for two entities, “Weka” and “Wkea” it will identify them as one entity, and with a spell checker in the final disambiguating pass, also identify which of the two entities is the actual entity .
The limitation of this apparatus , though is for non-identified/no-auto suggested words in the English dictionary , such as people names like “Ekta “ and “Keta” – in which case though it will identify and mark the two entities as the same, after having “identified the typo” – it will be limited in its scope to mention which one of them is actually the right representation.
In that sense, this application is the intent prediction in a global scope, but NOT the identification of which of the singular entities is the actual representation.
The QWERTY representation
Algorithm & data structure representation :
- The three rows of the English alphabets in the QWERTY keyboard are represented as nodes of the graph, with a distance between the adjacent nodes as 1 . The adjacent nodes in turn are picked up, accounting for the fact that typographical errors are more likely to occur in a neighboring zone. For example, according to the schema above , Q is adjacent to Q,W,A and S – the adjacency being in turn represented by the neighborhood of that node. For example, for D as the key, W,E,R,S,F,X,C,V are its immediate neighborhood.
- The concept of the neighborhood as above, also narrows down the what keys are acceptable typos’ in the immediate neighborhood, so that the distance metric does not penalize the user
- Extension of the adjacent nodes is the concept where the users inadvertently swap the left and right keys . Like instead of typing “weka” the user types “wkea” , since the fingers are not user’s synched up while typing. This can be due to inexperience in typing, or while pure errors while speed-typing (typical of surveys, call center chat data etc.) – in this scope the comparison, instead of being with the neighborhood/adjacent nodes as above, is against the left vs. right portion of the QWERTY keyboard- with defined positions for what is left and right . This ensures that when two consecutive characters are deemed as “typos emanating from left and right hand asynchronization” – these characters indeed come from left and right part of the keyboard.
- As the next step, the threshold of what is acceptable difference between the words is defined. Currently this is set as one-third of the absolute difference in the length of the words being compared. This is because the difference in the lengths of the words being compared has to be within a defined length . for example a user instead of typing “asynchronous” may type “asynchronus” (missing the “o” after “n”) – then given the threshold restrictions, the distance metric, should uniquely attribute this to asynchronous. (In this case, it will happen in the 1st step of auto-suggest itself, there by narrowing the scope – or reinforcing the uniqueness of the attribute)
- If the difference in the length of the words exceeds the threshold then the algorithm places a higer belief that the words are more likely to be different and uses the Levenshtein distance metric instead.
- As a final step to identify which of the two keys is the rightful key (“Lengt” and “length” are both mapped to one entity, but the algorithm till step 4 does not know which is the right “word”) , the word being compared is passed over an “auto-suggest” – which looks up predefined words in the English dictionary and outputs the candidate(s) with the lowest distance metric.
As a proposed extension to this algorithm – I intend try out different dictionary approaches for auto-suggest (together with Apache Solr and elastic search)
I used Python’s networkx to model the QWERTY keyboard, and built the QWERTY distance metric on top of it.
As in other distance metrics, this distance metric computes (n-1)comparisons for each word , thereby outputting the word with the closest distance, O(n2) complexity. Post this the auto-suggest features narrows down the rightful- scope by candidate key(s). Overall the complexity being O(n2) .
Performance metrics & Benchmark tests
The way to measure the performance of this algorithm is by passing it through the corpus of a hand-collected data set and compare its performance (time complexity and results against other distance metrics – like pure Levenshtein distance application, Manhattan & Euclidean distance, thought by a rule of thumb Levenshtein will do better on pure distance search among-st the other two (Manhattan & Euclidean distance)
While modeling this problem, I also found a version of Fuzzy string match as in here – http://search.cpan.org/~krburton/String-KeyboardDistance-1.01/KeyboardDistance.pm and I will contrast the performance results against this(pending)
Suggestions, critique is as always welcome.
Stay Green & Growing,
Tailored from my original post At Grace Hopper’s women in Computing, 2012 here .
Network, Network, Network – and do it right !
Networking is not “Foreplay” – and it isn’t creating Black holes either ..
And here’s why – :
Women haven’t traditionally had the benefit of “smoke-time” networking that seems to work so well for men (nor any womanized version of it). That said – surely there must be ways that amazing superwomen around the world are making it into the corner offices.
As a society, we have been forced to focus on getting business cards rather than building relationships that can be leveraged practically. This is equivalent of building black holes around us, in an attempt that someday we will find matter that will fill the hole.
The only problem with this approach is two fold: one, you are selling a bad product, and two, your self-promotional pitch was so unmemorable that the speaker decided not to set aside some processing power in her/his brain to remember you. No one remembers someone with an inflated sense of self-importance. There is a classical statistic from speed dating, which found that the men who hardly spoke but genuinely listened ended up getting dates more often than the other men who focused on speaking for a greater fraction of the speed dating.
At another end of the spectrum, I observe, often to my utter dismay, the fawning or servile approach. Rather than feeding their own self, the focus reverses to the person whom they deem influential enough. It may not be a conscious effort, and may simply stem from the lack of awareness of one’s real do-ablity. Specifically in Eastern cultures, including India, people are maneuvered to cushion their interests, to the point of obscurity, which is why sometimes the information and opportunities passes them by, and they don’t even take a notice. So, if you have a strong point, tie it together without taking the detour.
I said earlier that the focus should be on forging mutually befitting relationships, and here’s why: when you focus on creating black holes, as I call them, you are missing out on accidental benefits – amazing support systems, sponsors who would endorse you or vouch for you, mentoring opportunities, or just plain unbiased viewpoints – when you need it. So share something phenomenal that people will remember: spin relevant stories, be inquisitive and keep re-learning, and most of all, stay true to who you really are.
Differentiate the position from the person, be assertive, don’t downplay yourself, and yet don’t oversell. It is a classical balance problem.
” Network” , and network with an end-goal in mind, which is two fold : one, to create spheres of influence, and two, to create information advantage” .– Saundarya Rajesh, AVTAR
Four quick thoughts that sum this up:
1. Focus on the end goal and then work backward: Accidents -> acquaintances ->associates ->advocates -> allies. The point is simple if you want to be moving from “foreplay” conversations to forging lasting relationships for mutually befitting relationships you will invariably have to focus on quality.
2. Measure it: If you are not moving from acquaintances to allies over a period of time, the networks don’t serve a deeper meaning. You see, input alone doesn’t count, what you made of it does. The “measuring” bit doesn‘t have to be as flashy as excel but should suffice to give you a reinforcing feedback loop on your overall interpersonal and communication skills.
3. Have a compelling elevator pitch: An elevator pitch is something that gives a compelling introduction to who you are. Craft one, and try to improvise depending on the listener and her/his background and Intellectual giftedness. Of course this means that you also change dynamically depending on the sophistication that the listener might have about your profession.
4. Demonstrate equality: You can’t have an enriching conversation when you do not consider yourself to be one among equals. Equality begets equality, and it begins with awareness of who you are and what you stand for. Reboot your operating system and get moving – and yes, practice, deliver and then practice again.
The theory of 10,000 years from different schools of thought says that deliberate repetition is a key. So go develop a lens for the world you want to live in, and grow – and then just reverse engineer what you want to achieve of any relationship that could help you towards it and work your way through it. Of course, it assumes you focus on forging a mutually beneficial relationship and not just trying to get around into a parasitic relationship.
Guess what, we all like to push compelling and competitive candidatures.
Now, Go Rock!
Also published here – http://www.flexicareersindia.com/newsletter/jan2013/newsletter.htm
The larger Question and the meat of it thereof : Wouldn’t the user pays subscription model work as well for Quora ?
Here’s what I mean – As a user , I navigate about 10-20 questions in Quora every time I take out “time” I am willing to engage myself in. And then I go back and start to work on something, hit discussions in a multitude of channels and I forget – and that is when I want to refer to the same fact – in the organized way I first saw it in its context.
I think, that is part of the opportunity for Quora. As Achilleas Vortselas mentioned (at Quora) , “Quora as middleman”, I envision the next layer of structuring and personalization to build on top of the collective knowledge of the community.
I want to (and I hope that other users want to, as well) organize this information in a useful way. And that could be highlighting the pages, that are only “specific to me” , like individual page reads. Similarly, as a user I would be willing to pay for premium services like organizing my knowledge and contacts, and since it is personal, non-obtrusive to any one else. It’s my personal learning dashboard, that I can cross refer years from now, since I find information in a tier based fashion. One, by meta tags AND/OR (Quora) Boards, and two, by my personal notes.
[Hypothesis: Since we are building social capital here, the content management, will take care of itself. ]
Challenge & opportunity : Information retrieval from the end user’s perspective.
[The Public Good…]
With literally tons of CEO’s and Executives engaging in this model – subscription should be able to make money. May be, in the future, Quora would make part of its revenue from firms funding their top employees, that is the future of Crowd Sourcing Revolution that we are beginning to see.
The future of this will be – the firms sponsoring employees/ committing resources to work on abstract problems and new concepts and “Grooming Aggregators” , since it will be far cheaper than the Home-Grown Talent model , difficult to source, retain and keep engaged all round the project.
In some sense, it is how Rating agencies make money. Everyone else pools in, because everyone else does, and that is the social good at its BEST.
[Hypothesis: Knowledge workers have always and will continue to pay for tools that make them more efficient ]
Challenge & opportunity : Of course that needs high computing and read-writes to store the user specific Information,and engineering a solution that can scale as well. Which is where the guiding rule of making money comes in. Make money roughly equal to (Revenue-Cost of engineering the solution), and should be positive.
So my Quick points here –
User pays subscription model
a. User specific page read/write(s) and features to organize and personalise the information retrieval for later use.
b. Putting structure in YOUR personal Network – listing and saving personal content/notes for lookup to organize the contacts from mere “followers and people who follow” paradigm .
c. The Public good : Rise of Crowd Sourcing and Social capital.
Business opportunity : Innovating on the user specific needs(Behavioral & Business needs) and tailoring subscription plan that match these needs .
PPS: Don’t pick me on words🙂 Make money is dirty rule, but that’s what we are taking about here .
This article is a part of my original piece at Quora here . Cosmetics and Technical updates/stack probe in progress.
(Dear God, please tell my Mom I am doing very fine, and I still think that studying Quantitative Economics, and flying over 16 countries all the way was worth this Education)
Last week I was on my longest flight so far from the far west to the far east, and on my way back I bumped into a Gentleman who was coming back from Mongolia (the capital , Ulan Bator, to be precise). Talking with him made this trip one of my most shortest trip ever. He was into Marketing side of Construction and myself a bottom up economist and auto-didactic. So what could we talk about ?
I had no clue of Construction, or Marketing in construction business or even Mongolia . All I knew about Mongolia was that they have huge deposits of coal . Some people might hit a stalemate at that, what can you really talk, while I was like – tell me more about it !
We talked about the Mongolian empire, how the Mongols built the Monasteries all the way to Hungary(We could not agree whether the Mongols came up-to Hungary , but that is for some other time), it’s comparison with the Romans and the influence of Mongols on food. And seamlessly, we moved to the Chinese dynasties the Han empire, the Qing dynasty and how the emperors had tried to bring Buddhism to bind people together , and then we moved to the dispute over Tibet and Taiwan , and what I had learnt in the Harvard Project for Asian & International relations, in Taipei , Taiwan, that I was just coming from.
Come to think of it – it was a powerful conversation(and I really learnt a lot). Now that is the thing, finding CPI , or the common point of interest sets you apart from the rest. And come to think of it, it’s really an art. For once he did not tell me what a superwoman I was ( I like telling myself that I am one) – but I instantly knew that he enjoyed talking to me, as much as I did.
This being able to talk to people “about them” has been my journey of the last 4-5 years. I have practiced and matured it on CEO**’s and Prof’s – but that is not to brag- the point is about learning.
So my quick take-aways –
1. People are always willing to “Teach” you, once they see the intellect + energy + enthusiasm in you to learn .This is how we grow, and is a more sustainable way of learning rather than feeding some decaying facts into your brain.
2. You never choose your mentor. Your mentor chooses you. Point. (Borrowed from Indra Nooyi)
3. Always push meritocratic people, even if you have to go out of THE way to get them some limelight, and then just keep passing the ball.
4. You don’t ask, you don’t get. Under –rated, over-said, but true.
5. No matter what you want to learn or know, stop the foreplay, and get to the CPI , quick enough to hold them.
6. What goes around comes around – Gratitude can not be taught, practice it.
7. There is no single character that can beat your Genuineness – you will know it when it has arrived to you.
8. Push & stretch yourself, and then stretch yourself a little more.
9. You are beautiful, because of your words and your self awareness and that is something that sets you apart and a Brand “YOU” in this dynamic very competitive world.
10 . We are more than the organizations , and schools we represent – and in some way the hierarchies of power are shifting bottom up . How YOU speak really defines your School, your community and your organisation and your people.
11. NEVER stop learning.
12. Everything around you, has always been there, but when you meet people who have been there, done that, it sets a context that you can build on, like things, people, places, empires have suddenly started to exist in your conscience. Beg, Borrow, steal, but GET that context right in your head, you will amazed at how much you can really learn, after-all .
13. Stop being mediocre, and stop when you have made your point. And, hold your drink, if you don’t have that super take-away :)
(Emerging ideas from my book – 2o’s is the new 30’s , planning the dots, you will want to connect backward, Dedicated to Annie Fan, a superwoman who knows how to listen, and listen well)
** John Kearon, CEO, Brainjuicer that even shaped 4 months of my work – My Master’s Thesis – On how we think we think, is not how we really think . In some ways, that is THE bit about connecting the dots.
Traditionally, the goal of e-governance projects across the world has been broadly, twofold – ensuring a faster service time to address the pain points and lowering the cost of delivering the public services. However, as the new technologies such as Social media and Big data emerge, Governments should re-think of e-governance not merely as technology centric – but as a tool for participatory public policy.
The possibilities that social media has opened up for governments are many. For starters consider the GovLab, an initiative in USA which works with senior government executives and thought leaders from across the globe – and runs controlled experiments for a better public policy, while also helping reduce the costs of providing services. Or consider @sweden initiative by the government of Sweden, which along with an advertising company runs a twitter account controlled by ordinary citizens for seven days, on a roll basis. This project aims at better governance, engagement of its citizens by amplifying their voice in a transparent manner- while also supporting the tourism industry in Sweden.
The other end of the spectrum are projects aimed at reinforcing good citizenship based on behavioural economics – across health, reducing fecal pollution by dogs by inducing the dog-owners to clean (Taiwan) , and creating incentives to ask for receipt and beat corruption in tax compliance (mainland China). These are experiments the rest of the world is exploring top change behaviours that stick.
In the Indian context – now consider delivering mobile health care, where social media listening tools can offer countless opportunities to help deploy resources in an optimized manner – allowing for an efficient delivery to the “last mile”, and developing key infrastructure and utilities in the currently underserved districts. Or, consider supplementing the RBI’s latest e-governance application and online tracking system in its foreign exchange department with social media initiatives such as those developed by Clemson University, USA– that used social media listening tools to predict the direction of stock market and foreign exchange. Having similar projects can help triangulate the consumer and Foreign investor sentiment helping the central banks and governments to handle pressing monetary policy issues in a dynamic manner. The benefits are manifold – allowing for a better planning and attracting the fleeting FDI by signalling the trends in consumer market thereby re-enforcing that India is indeed a key market from where future growth will emerge.
Or look at the Aviation, FMCG or telecommunication sector – by bringing symmetry to the information in the markets – we can supplement existing institutions like the Competition Commission of India, CCI to extract useful information on consumer surplus to help decide on how best to allow the competitive landscape emerge, and help the businesses grow. By supplementing their insight with an additional triangulation of relevant and contemporaneous data, this will allow us to do what is actually in the consumer interest, rather than what “they” think it is. This is the helm of participatory governance.
Bringing such innovative disruptive projects will also help entice top talents into public sector and help revive the competitive landscape by restricting brain drain to the much coveted private sector. By using wisdom of crowds and crowd sourcing its problems – this will create a rich vibrant pool of ideas in a fair meritocratic manner, while keeping the overall costs of the project considerably low. We opened up our markets for the world in 1991, and now it is time to open the governance for our citizens and best brains, by making it relevant. Of course like all the sectors, in its bare bones, it has challenges manifold. The first is a cultivating the mindset – but the good news is the developing trend of today’s dynamic youth to lead parallel careers and consultative projects for the social good. Then there is a disheartening mere 10.2% internet penetration in India, the need for regulating the “social listening” if at such pilot programs are launched . Yet against all of this, we together, are more resourceful than the resources that constraint us.
In conclusion, amplifying the voice of the citizens in a transparent manner can help the governments do a cost benefit analysis of which public goods to develop, allowing for better five year plan – essentially outsourcing it’s public policy and developing a truly meritocratic governance. This in true terms is, the largest (democracy) – for the people, of the people and by the people.
The West is changing its traditional structures, and shifting the democracies bottom up – it’s time we caught up, too. Incredible India, after all.