Machine Learning and Data Science Intern at Affectly (Automated fundraising platform)               

                                                                                                       Dec 2015 - Jan 2016

  • Crowdsourcing and Natural Language Processing for Twitter profile analysis

  • Data analysis in Python, MongoDB manipulation; Implement combined machine learning models

During my ML internship at Affectly, I researched on crowdsourcing for Twitter profile analyzing and implemented my own algorithm to give recommended impactful Twitter accounts for client’s fundraising. I implemented my own python crawler with Twitter API and designed pipeline to continuously crawl data into MongoDB database. I used Natural Language Toolkit (NLTK) to preprocess like tokenization, part-of-speech tagging, etc. I also used Principal Component Analysis (PCA) to reduce dimensionality to cluster and visualize Twitter users’ distribution. I compare the relevance among users by combining several ways: transforming users’ data into tf-idf vectors through Gensim and calculating the cosine similarity; Using wordnet to calculate the semantic similarity; using Latent Dirichlet Allocation (LDA) to calculate the higher level cosine similarity; lastly, training a Support Vector Classifier (SVC) to calculate the similarity between users.