Homepage of
      G. Praveen Kumar

142 Halsey Dr, Apt 7
West Lafayette, Indiana - 47906
Phone : +1 765 637 1367
Email : gpraveenkumar5[at]ymail[dot]com
Homepage : http://gpraveenkumar.com
Resume | CV
If we knew what it was we were doing,
it would not be called research, would it?
-- Albert Einstein

Get in Touch :

   Biography       Research Interests       Research Experience       Work Experience       Publications       Projects       Courses       Achievements       Positions of Responsibility       Blog


I am a fifth year PhD Student at the Department of Computer Science, Purdue University. I work with Professors Dr. Jennifer Neville and Dr. Luo Si on problems related to Social Network and Machine Learning. My expertise are in the areas of Modelling, Classification, Clustering, Label Propagation, Data Sciences and the like. I aspire to become a Data Scientist. Prior to joining Purdue, I worked for an year as an Associate IT Consultant at ITC Infotech Bangalore, India. Prior to that, I completed my Bachelor’s degree in Computer Science and Engineering from National Institute of Technology, Durgapur, India with honours. Actively seeking for collabortors in the areas of my reasech interests.

Research Interests

  • Machine Learning
  • Social Network Analysis
  • Data Mining
  • Information Retreival
  • Big Data Analytics
  • Natural Language Processing
  • Semantic Web
  • Text Mining
  • Cryptography and Network Security

Research Experience

  • Research Assistant at CS Department
  • Jan 12 - Present
    • Improving Classification Accuracy by replicating the training data is a well studied problem in the text domain. Techniques like Marginalized Denoising Autocoders have improved classification performance by marginalizing or taking expectation over the training data without actually replicating them. Similar approaches to solve problems like label prediction, link prediction in the Network Domain have not been explored. Initial results we obtained for label prediction after replicating data by flipping labels, dropping nodes, dropping/rewiring edges are promising.
      Project Guide: Dr. Jennifer Neville
      CS Purdue University
    • Built a system that would capture code as students write programs in Alice Programming language. Built a tutor out of While module to give live feedback and decide student promotions based on the code captured. Developing a recommendation system to indicate common programming fallacies to prevent a student from the same and to improve the programming experience.
      Project Guide: Dr. Luo Si,Dr. Buster Dunsmore, Dr. Steve Cooper
      CS Purdue University, CS Purdue University, CS Stanford University
    • Built couple of modules in a Math Tutoring System for Students with Learning Disabilities. This intelligent tutoring system for math problems was built by using Adobe Flash. I also used to make regular school visit to work with students of Grade 3,4 and 5 to see how they interacted with the UI to incrementally improve the quality of the software based on student interaction.
      Project Guide: Dr. Luo Si, Dr. Yan Ping Xin
      CS Purdue University, College of Education Purdue University
  • Research Assistant at ITaP (IT at Purdue)
  • August 11 - Dec 11
  • Research intern at Knowledge and Data Engineering, Germany
  • May 09 - July 09
    • Interned with the Knowledge and Data Engineering Research Group at University of Kassel. During the period, I worked, along with Professor Dr. Gerd Stumme and members of the group, on a research project "Semantic Analysis in Query Log Data" and built a "Similarity Framework" in Java that integrates Perl scripts for the computation of similar tags in Folksonomy or like data. The results of our works are published at ECML PKDD 2009.

    Work Experience

  • Data Scientist intern at Apple
  • May 15 - Aug 15
    • Worked at the iAd team on User Segmentation and Behavioural Targeting. Ideated and prototyped a new product - Lookalike Segments.. Used Latent Semantic Analysis (SVD) to find latent feature/traits among users. Segmented user based on their click behaviour pattern, their relation to Apps and latent features. Used Hive to preprocess data, Python and R for data processing. Built a visualization tool using D3 to qualitatively analyze and compare User and Lookalike Segment. Also implemented the more superior Probabilistic Latent Semantic Analysis technique.
  • Data Scientist intern at Linkedin
  • May 14 - Aug 14
    • Worked with the Data Sciences team on Clustering Fields of Study (Majors). Constructed networks of Fields of Study (FoS) using features like member skills, inferred classmates. Detected clusters of FoS using Louvain's Modularity (hierarchical community detection algorithm for graphs/networks) to improve and modify Linkedin's existing FoS taxonomy. Clusters obtained were significantly better than that of traditional hierarchical agglomerative clustering. Used visualization tools like Gephi and D3 to analyze the graph, clusters and taxonomy. Used Apache Pig and Hadoop to preprocess data, Python to do all the data processing and HDFS to store the results.
  • Associate IT Consultant at ITC Infotech, India
  • July 10 - June 11
    • I have worked as an IT Associate Consultant at ITC Infotech, India. I work with the Product Lifecycle Management team, on a project, for Brown Shoes, a global footwear company. I customize software called FlexPLM that runs on Windchill according to client’s requirement. Specifically, I build web pages incorporating necessary logic using JSP and java script.

    Publications [DBLP] [Microsoft Academic Search]

    • thesis
    • G. Praveen Kumar, Anirban Sarkar, Ilhyun Lee, Haesun Lee and Narayan C. Debnath “A Novel Approach for Hierarchical Clustering in Non - Binary Search Space”, In the Proceedings of 8th International Conference on Industrial Informatics (INDIN ’10), pp. 693 - 697, Osaka, Japan, 13-16th July, 2010. [ PDF ]
    • G. Praveen Kumar and Anirban Sarkar “Weighted Association Rule Mining and Clustering in Non-Binary Search Space”, In the Proceedings of the 7th International Conference on Information Technology: New Generations (ITNG ’10), pp. 238 - 243, Las Vegas, Nevada, USA, 12–14 April, 2010. [ PDF ]
    • G. Praveen Kumar, Arjun Kumar Murmu, Biswas Parajuli, and Prasenjit Choudhury “MULET : A Multilanguage Encryption Technique”, In the Proceedings of the 7th International Conference on Information Technology: New Generations (ITNG ’10), pp. 779 - 782, Las Vegas, Nevada, USA, 12–14 April, 2010. [ PDF ]
    • G. Praveen Kumar, Biswas Parajuli, Arjun Kumar Murmu, Prasenjit Choudhury and Jaydeep Howlader “A Lossless MOD-ENCODER Towards a Secure Communication”, In the Proceedings of the International Conference on Recent Trends in Information, Telecommunication and Computing (ITC ’10), pp. 330 - 332, Cochin, Kerela, India, 12–13 March, 2010. [ PDF ]
    • Dominik Benz, Beate Krause, G. Praveen Kumar, Andreas Hotho, Gerd Stumme, “Characterizing Semantic Relatedness of Search Query Terms” In A. Nürnberger, M. Berthold (eds.): Proc. Workshop on Explorative analytics of Information Networks at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2009), pp. 119 - 135, Bled, Slovenia, 11th September, 2009. [ PDF ] [ PPT ]
    • G. Praveen Kumar, Anirban Sarkar and Narayan C. Debnath “A New Algorithm for Frequent Itemset Generation in Non-Binary Search Space”, In the Proceedings of 6th International Conference on Information Technology: New Generations (ITNG ’09), pp. 149 - 153, Las Vegas, USA, 27–29 April, 2009 (Nominated for Best Paper Award). [ PDF ]


  • Label Prediction in Social Networks using NLP
  • Mar 15 - Present
    • Trying to predict the gender, political preferences and religious views of Users(Nodes) on Social Networks like Facebook. Initially used only network features and techniques like Gibbs Sampling for prediction. Started looking at textual Features to improve the prediction accuracy. For instance, by using Facebook wall posts of users and their connections, their gender can be predicted with 92% accuracy.
      Project Guide: Dr. Dan Goldwasser
      Assistant Professor, Department of Statistics, Purdue University
  • Quantitative Analysis of words and categories in Multiclass Regression
  • Dec 13 - Present
    • Trying to apply a joint high dimensional Bayesian Variable and Covariance Selection model to the multiclass textual classification. The word features are the variables and hence, variable selection problem corresponds to finding words that are good predictors overall and for specific categories. The covariance selection gives information about dependencies between the multiple categories.
      Project Guide: Dr. Anindya Bhadra
      Assistant Professor, Department of Computer Science, Purdue University
  • Empirical Analysis of Personal Email Network
  • Nov 13
    • Constructed and analyzed three different types of ego networks obtained from Gmail consisting of about \textit{seven and half years} of emails. Applied clustering and community detection algorithms to detect communities based of my email communications and compared them with communities detected from my facebook friendship network. Interestingly, I could recover a good number of them. [Project Link] [Report]
  • TREC - Knowledge Base Acceleration Track
  • May 13 -- Aug 13
    • Had to filter documents related to entities (140 Wikipedia and 20 Twitter) that are worthy of citation in their profiles. The challenges we \textit{two fold}, \textit{one} the data was huge around 6.5 TB of compressed data consisting of social data, news articles etc. and \textit{two}, the entities had very few training examples, in the order of 10. Built a model similar to one-vs-all classifier and F1 measure was close to 0.6.
      Project Guide: Dr. Luo Si
      Associate Professor, Department of Computer Science, Purdue University
  • Supervised LDA for Masquerader Detection
  • Feb 13 -- Apr 13
    • Extended a work of the PhD Thesis of Malek Ben Salem, that builds user-profiles based on search behaviour with a predefined taxonomy of applications and processes to detect masquerader attacks and intrusion detection. Built a novel method by using a variation of LDA to build the taxonomy automatically . Also showed that by using the latent classes obtained from the model as feature, we could build classifiers that give the same performance as those that used all the feature, essentially a huge feature space reduction.
      Project Guide: Dr. Seregy Kishner
      Associate Professor, Department of Computer Science, Purdue University
  • Indiana Social Search
  • May 12 - Present
    • Built a website in PhP Indiana Social Search, to I crawl and classify news articles and tweets from Google News and Twitter respectively, into prede ned categories. This system is going to be integrated with the famous INDURE project. I am also working on extracting trend from the articles classify to make a trend cloud of the popular happenings in the state of Indiana and extracting mean ingful summaries for the crawled news articles. The LINK to the website.
      Project Guide: Dr. Luo Si
      Associate Professor, Department of Computer Science, Purdue University
  • Sampling and Analysis of Social Network Activity Graphs
  • Sep 11 - Dec 11
    • Mining Information from social networks gives valuable information about user activity and interaction. Constructed social network activity graphs of senders and receivers from the Purdue email data. Sampled data over two day window spans and computed various graph properties like the average degree, density etc. for these windows and the aggregate graph. Compared and contrasted email user activity with those of friendship networks like facebook.
      Project Guide: Dr. Jennifer Neville and Dr. Ramana Rao Kompella
      Assistant Professors, Department of Computer Science, Purdue University
  • Data Mining in Non-Binary Data Sets
  • July 08 - April 2010
    • Binary dataset representation gives information about an item being present or not in the search space, but does not provide any information about the strength of its presence which can be more effective in drawing association rules close to real life situations. Hence, we developed an algorithm for mining frequent itemsets and association rules from non-binary search space. As an extension, we generated weighted association rules. Further, we developed clustering algorithms for non-binary search space.
      Project Guide: Dr. Anirban Sarkar and Dr. Narayan C. Debanath
      Asstistant Professor, Deptartment of Computer Application, NIT Durgapur and Professor, Deptartment of CS, Winona State University, USA
  • Data Mining in Mobile Networking
  • Jul 09 – Apr 10
    • Data from Mobile Networks was analysed for predicting user movement, customer recommendation, business forecast and analysis. Predicting the user movement is an issue of major concern in mobile communication for better handoff mechanism and ensuring quality of service. Grouped User Profile based on Cells matrices and hierarchically clustered them. Also grouped frequent cells together based on user movement. Built a framework in java for performing necessary computations.
      Project Guide: Mr. Parag Kumar Guhathakurtha
      Assistant Professor, Department of Computer Science and Engineering, NIT Durgapur
  • Compression and Encryption for Secure Communication
  • Jul 09 - Nov 09
    • The ever increasing internet traffic constantly urges the need for enhancing communication security. So, we developed an algorithm for performing encryption and lossless compression at the same time in order to increase bandwidth utilization and to secure data transmission. We essentially converted the message into a bi-tuple using mapping techniques and encoded only one elements of the tuple.
      Project Guide: Mr. Prasenjit Chowdhury and Mr. Jaydeep Howlader
      Asstistant Professor, Department of Computer Application and Asstistant Professor, Department of Information Technology, NIT Durgapur
  • Semantic Analysis in Query Log Data
  • May 09 - July 09
    • Mining for semantic information from search engine query logs bears great potential for both the optimization of search engines and bootstrapping Semantic Web applications. Further, the formalization of log data into Logsonomies retains semantics information. Therefore we analysed and semantically characterized query term relatedness by grounding it to WordNet and compared it to prior results of Folksonomies.
      Project Guide: Dr. Gerd Stumme and Dr. Andreas Hotho
      Professor and Senior Researcher, Department of EE/CS, University of Kassel, Germany
  • MULET : A Multilanguage Encryption Technique
  • Mar 09 - Oct 09
    • The use of a multilingual approach in cryptography was not prevalent. So we focused on encryption of plain text over a range of languages supported by Unicode. We used mapping techniques to make the algorithm fast, efficient and easier to implement. Further, the replacement strategy used ensures better security. We believe this will facilitate the localization of Cryptographic Software tools.
      Project Guide: Mr. Prasenjit Chowdhury
      Assistant Professor, Department of Computer Application, NIT Durgapur
  • Document Clustering using Lexical Chains
  • Dec 09 – Jan 10
    • Lexical chains can be used to group documents together based on a common idea contained in the documents. Quality of clustering was improved by considering hypernyms, hyponyms etc. to build synsets and consequently lexical chains. We also addressed a situation where a document has a set of lexical chains common with one document and another set of lexical chains common with another document and so on. A Hierarchy of clusters can best depict such situation. Cliques can obtain such hierarchies from documents considered as nodes of a graph.
      Project Guide: Dr. B. Ravindran
      Associate Professor, Department of CSE, Indian Institute of Technology, Madras
  • Formal verification of softwares using Spin Model Checker
  • Mar 08 – Feb 09
    • Was involved in research on a project for Formal verification of Software. The Spin Model Checker is used to verify the integrity of software. Extracted the state transition diagrams and used the language Promela to get the properties verified. This may be extended to verify application in Web 2.0 and verification of network protocols.
      Project Guide: Mr. Prasenjit Chowdhury
      Assistant Professor, Department of Computer Application, NIT Durgapur

    » More Projects


  • Coursera
    • Machine Leanring
    • Social Network Analysis
    • Social and Economic Networks: Models and Analysis
    • Mining Massive Datasets
    • Big Data in Education
    • Computing for Data Analysis
    • Statistics One
  • Computer Network Management
  • May 08 - June 08
    • An intensive experience providing hands-on training in Network management addressing practical aspects such as Linux essentials, shell scripting, socket programming, installation and maintenance of HTTP, FTP, DNS, NFS servers at Nettech INC. in association with the Goa Institute of Management, Goa.


    • Received scholarship from NIT Durgapur and NITDAA (NITD Alumni Association) as funding for my internship at KDE group, University of Kassel, Germany.
    • Positioned 1st in “Open Project”, the Project cum Paper presentation contest in “Mukti ‘10”, the Annual Technical Symposium on GNU/Linux and Free Software of NIT Durgapur, 5th - 7th February, 2010.
    • Adjudged 1st in “Concepts” by IEEE Student Branch, NIT Durgapur for the best project abstract proposed amongst 40 abstracts.
    • Stood 1stThe Brand Game” for designing and marketing a Mutual Fund firm in “aarohan2k9”, a National Level Techno-Management Festival of NIT Durgapur held during 26th February - 1st, March 2009.
    • Awarded 2nd prize in “Konfigure”, the System Administration contest in “Mukti ‘09”, the Annual Technical Symposium on GNU/Linux and Free Software of NIT Durgapur, 2nd - 8th February, 2009.
    • Judged as 3rd best undergraduate performer by Sun MicroSystems for the project “The Ultimate Exam Simulator” in Share 2008.
    • Secured a place in the Top 10 among 138 participants in the Network Management Training Program, organized by Goa Institute of Management, Goa and Nettech INC. .
    • Certified ‘Good’ Core Java professional by NIIT.
    • Received Certificate of Merit for scoring good percentage Marks in Standard XII and X.

    Technical Skills

    • Programming languages: C/C++, Java, Python, Perl, PL/SQL, Visual Basic, Latex.
    • Data analysis and visualization: R, Matlab, Gephi, Lemur and Indri toolkits, RapidMiner, D3.
    • Big Data technologies: Hive, Apache Pig, Hadoop.
    • Web Development: HTML/DHTML, PHP, JSP, Ajax.
    • Tools: Eclipse, NetBeans, SVN, GitHub, Star UML, Star UML, Adobe Dreamweaver and Flash.
    • Hardware: Verilog.
    • Databases:HDFS, Titan, Oracle, MySQL, IBM DB2, MSSQL.
    • Logic: Prolog.
    • Platform Expertise: Linux, Unix, Windows and Macintosh.

    Positions of Responsibility

    © 2011-2015 G. Praveen Kumar