We should not judge people by their peak of excellence; but by the distance they have traveled from the point where they started.

Friday, April 10, 2009

What is research?

What is research? I was a dumb guy who made many mistakes when I landed here because I was not knowing exactly what is research. I believe many of you know it already. For those who do not know here is some glimpse into the world of research. For all those who are planning to attend graduate school freshly this year, my best wishes to all of you and have a successful academics.

For all the systems guys out there. Research is not equivalent to Linux Kernel. Linux kernel is a small entity that goes in making your understanding about research good. Linux kernel / Windows kernel is an implementation about which if you know already, it would ease your process of being a researcher. If you do not know it, you can still survive as a researcher by just knowing the concepts. But on a broader scale what Industry demands from you if you are planning to have a career in systems area is how well conversant you are with Linux Kernel. So getting your hands dirty in it is strongly recommended.

Research is a vast area which has to do with solving problems which are unsolved till now. It falls in two categories.

1. Theoretical research -
2. Practical research -

Theoretical research consists of different theoretical topics such as statistics, mathematical proofs, machine learning algorithms, algorithmic proofs, programming languages proofs basically any thing that is in the field of theory and which could be solved by taking a pencil and paper and which if need arises could be proven by developing a simple working prototype. Why theoretical research is needed? Because many times building a system is not the only answer and something more generic is needed to be able to say that yes, if this could be theoretically proved on paper, if implemented it should work. Theory is needed to build the confidence behind the large work that one plans to do and to show that things will work.

Practical research -
Systems research( Operating systems, storage systems, real time systems, distributed systems, large parallel scientific systems etc), Database research, Gaming research, artificial intelligence, networking research, compilers research etc. So now you understand that in systems Linux kernel, on which most of the industry works is just a small part of the big picture. Linux kernel is an implementation and most of the work in it is already done. The research associated in there is how to improve the existing algorithms and similar. Point of stress is Linux kernel proficiency is not research. But if you are proficient you will have a definite edge amongst your peers and would be a superfast learner.

A good researcher takes a topic, identifies the problem, does the survey of how it is addressed so far in research community and sees if it is worth going ahead and if it is interesting. A research need not address all the problems related to the topic from all perspectives, like for example if you are building a new desktop search engine say, you might want to solely concentrate on the algorithms involved in search engines and the core engine. You need not worry about the other aspects for time being like the security offered by search engine in case of multiuser interface on a desktop etc. The crux is the research on a particular topic has to contribute in one dimension atleast, it need not contribute in every aspect. A good researcher makes use of theoretical and practical aspects of research and does not limit himself/herself to a particular aspect.

Common topics in research are categorized as.
1. Survey based research -
Compare existing methods, approaches, implementations to analyze pros and cons, differentiate, suggest improvements.

2. Analysis based research -
Collect large data sets from different existing frameworks and benchmarks and see the common trends. For example say, a study of how files are stored on a hard drive of corporate users could indicate some trends on the basis of which new research could be improved.

3. Build a collective implementation -
Different systems built consider different problem complexities and address different issues. A collective implementation could be the one where all the features researched by different researchers could be integrated in a single system and posed as a collective research. JAVA language is a nice example of such research where in it incorporates different features of different programming languages in a single language.

4. A totally new implementation -
A totally new implementation which is improvement on a previous approach or with a totally new approach. Like say a totally new I/O scheduler or a totally new RAID level in storage systems.

The research is presented and approved by submitting it in the form of a publication or paper to different conferences. Here eminent researchers from academia, industry review the papers and approve it. The research community is very small as compared to the developer community. So mostly in different conferences one tends to meet same people again and again. Some of the well known systems conferences are FAST, OSDI, PDSI etc. Research is also published in journals and as technical reports.

http://www.usenix.org/events/fast

Universities in US are based on these two categories.
1. Research based university
2. Not so much research based university.

A typical practical research paper is arranged in the format as
1. Abstract
2. Introduction
3. Design
4. Implementation details
5. Experiments
6. Analysis
7. Related work
8. Conclusion.

A good way of reading research papers is to go over the entire paper to understand the core concept and keywords, in the first reading. In the second reading when you have developed sufficient context in head to understand the direction the paper is taking, read it thoroughly. Some papers are written very nicely in very elaborate understandable manner, some are written badly. So do not feel bad if you do not understand a paper even after you have tried putting 100% efforts.

A good way of being successful in US education system is to understand how the people around you (US people) are behaving and studying and follow in their footsteps to some extent. Reading the material that is going to be covered in the class beforehand, by just going over it once from textbooks, presentations, web articles, wikipedia, so that you have that context in your mind. So when the professor starts speaking you will feel involved in the lecture. If your class is project oriented, take projects which would make you learn more things so that you would have got maximum exposure. The best advice is to talk to as many people as you could in class, make friends, study in groups, have lots of brainstorming and discussions, and sharing of knowledge. Discussions are the best ways to get things done. Get out of the notion that "C" is the best language. A good researcher thinks that the language suitable for doing the job in hand is the best language then it could be python for scripting or prototyping, Java for network prototyping, or javascript for web-services. Crux is do not be under assumption that a certain approach is best and should be followed, keep your eyes and mind open to all new receptive learning. Do not typecast yourself in one notion that this is good and this is bad. Everything has its own merit and respect and learn it as much as you can.

That is most of it. Wish you all the best and have a great academic, enjoyable academic session.