Project
Table of contents
- Overiew of Course Project
- Project Proposal
- Midpoint Presentation
- Midpoint Report
- Final Poster
- Final Report
Overiew of Course Project
The subfield of deep learning within the field of machine learning is a widely used technique, in many modern technologies such as in driverless cars and video captioning. Therefore, the main goal of the course project is for you to apply deep learning techniques to “real” problems. As this will prepare you to begin a career in deep learning and machine learning. Or at least help you navigate any challenges you may encounter in modern data and statistical analysis.
Background Information
The first step of the project is to choose a research topic for the project proposal. There are three types of projects:
- Application project: This is the most common type of project. Apply an existing deep learning algorithm to a novel problem or apply the algorithm in new way to the problem. In any case, the application will be novel.
- Algorithmic project: Develop a new learning algorithm or a model architecture and show its application on a dataset.
- Theoretical project: Analyze and prove an interesting/non-trivial property of a new or an existing learning algorithm. This is a very difficult and is rare project to implement. So please avoid this option, as it will most likely be unsuccessful.
Most projects will be a combination of the first two types. Also, replicating results in a paper can be a good way to learn. However, if you replicate a paper, you also need to use the technique on another application, do some analysis of how each component of the model contributes to final performance.
When choosing your project topic, make sure that you will be able to implement the project within the duration of the course. It is expected that your final presentation and report will have results obtained from your deep learning models. As this will provide a large contribution to your final grade on the project.
Your final written report for the project will be close to publication quality for a conference or journal. Meaning that with some refinement and additional work, your course project can be submitted to a conference or journal. Also, you may be able to use the course project as the beginning of a thesis project.
How to Find Project Ideas
The best way to find project topics is to look at previous work.
The two main conferences for deep learning are ICML and NIPS. The published work from both conferences can be found from:
- NIPS (also known as NeurIPS)
- ICML: simply change the year in the address bar to go to previous conferences
Even better, you can find all the published articles from those two conferences and other relevant conferences all in one location from. Various conferences are listed by year:
Also, You can check for background information and relevant research about your topic using an academic search engine such as:
For your project, you will need to consider what dataset you will work on and how you will obtain that dataset. If the dataset needs significant preprocessing or you aim to collect the data yourself, be aware of how much time will be devoted to this aspect of the project. Since, other aspects of the project, such as the actual implementation of a deep learning method on the dataset, will still need to be completed.
You are encouraeged to collect your own data.
However, if you are having trouble, you can use data from precurated sources. You can obtain prepared and somewhat preprocessed datasets from sources such as:
- Kaggle (recommended): Kaggle also provides free Jupyter Notebook environments for data processing.
- UCI machine learning repository: There are some good datasets here too. They tend to come from published articles.
The topic of your project can be from areas such as the following:
- Athletics & Sensing Devices
- Audio & Music
- Computer Vision
- Finance & Commerce
- General Machine Learning
- Life Sciences
- Natural Language
- Physical Sciences
- Theory & Reinforcement Learning
Also, for additional project ideas, please check the List of Projects at the bottom of the page of the following link:
Project Evaluation
The project is divided into four parts for a total of 100 points:
- Proposal (10 points / 4% of total grade)
- Midpoint Report & Presentation (20 points / 8% of total grade)
- Final Poster (30 points / 12% of total grade)
- Final Paper (40 points / 16% of total grade)
The project will overall be evaluated on:
- Technical quality of the work
- Does the technical material make sense?
- Are the experiments in logical and reasonable manner?
- Are the proposed algorithms or applications clever and interesting?
- Do the authors convey novel insight about the problem and/or algorithms?
- Significance
- Did the authors choose an interesting or a “real” problem to work on, or only a small “toy” problem?
- Is this work likely to be useful and/or have impact?
- Novelty of the work
- Is the project applying a common technique to a well-studied problem, or is the problem or method relatively unexplored?
To demonstrate the novelty and effectiveness of your project, you need to clearly indicate the importance of your topic, the improvement you are implementing, and how it compares to previous work.
Project Proposal
Due Date: Tue., Mar. 15 @ 11:59 pm
The goal of the project proposal is for you to begin work on a project and to receive feedback. You are not allowed to do joint projects with other classes.
Deliverable
In a PDF document, please provide:
- team name and proposed project title
- the names of your group members, including your student numbers
- a summary of approximately 300 – 500 words, describing the project
For (3), the summary should include the following sections and information:
- Background and Motivation:
- Provide some context for the problem you are trying to solve.
- Is this an application or a theoretical result?
- What is the previous research?
- Method:
- What deep learning techniques are you planning to apply?
- Intended experiments:
- What type of data will use?
- What experiments are you planning to run?
- How do you plan to evaluate your deep learning algorithm?
The description and analysis of the problem should persuade the reader that it is worthwhile problem to study.
Please make sure the document is professional – clear writing and proper formatting.
Submission
You will submit your PDF document as a group through Blackboard.
Grading
The proposal is worth 10 points:
- Problem and Motivation (4 points)
- Analysis of Problem (4 points)
- Novelty and Creativity (1 points)
- Report Clarity and Presentation (1 points)
Examples
Proposal Examples (2021 Spring)
Midpoint Presentation
Due Date: Mon., Apr. 11 @ 11:59 pm
Deliverable
You will create a presentation that summarizes your midpoint report. Your presentation should be 6 slides long and 3 minutes in length.
Submission
You will submit your presentation as a PowerPoint(.pptx) or PDF file as group through Blackboard.
Grading
The midpoint presentation is worth 5 points:
- Introduction (0.75 points)
- Related Work (0.25 points)
- Materials and Method (2 points)
- Preliminary Results (1.5 points)
- Presentation Clarity and Organization (0.5 points)
Examples
Midpoint Presentation Examples (2021 Spring)
Midpoint Report
Due Date: Fri., Apr. 15 @ 11:59 pm
The midpoint report will help keep your project on track. It needs to describe what you’ve accomplished thus far and briefly state your next experiments and steps. It should be written as an “early draft” of what ultimately will be your final project reports. Basically, you are writing the first few pages of your final project report, so that you can reuse most of the midpoint report text in your final report.
Please write the midpoint report (and eventually your final report) with the understanding that the intended audience are individuals who would understand deep learning. As a result, you should not spend two pages explaining how an autoencoder works. Rather you should summarize the main concept behind the algorithm, and focus more on the reasoning behind your experiments and the implications of your results (i.e., the Discussion section). Also, make sure that you include sufficient related works in your Introduction section.
After the submission of the midpoint report, the expectation is that your final project report will be on the same topic. Hence, this is your last chance to make any adjustments to your project topic focus.
Deliverable
You will write a report with the following specifications.
Your report should be 2-3 pages long, excluding references, should include the following:
INTRODUCTION (0.5 – 0.75 pages)
This is an expansion of your Background and Motivation from your proposal
Explain the problem and why it is important. Discuss your motivation for pursuing this problem. Give some background if necessary. Clearly state what the input and output is to your model. Be very explicit: “The input to our algorithm is an {image, amplitude, patient age, rainfall measurements, grayscale video, etc.}. We then use a {neural network, linear regression, etc.} to output a predicted {age, stock price, cancer type, music genre, etc.}.” This is very important since different teams have different inputs/outputs spanning different application domains. Being explicit about this makes it easier for readers. Also include a figure demonstrating the overall work-flow of your overall idea. The idea is to make your paper more accessible, especially to readers who start by skimming your paper.
RELATED WORK (≈ 0.5 page)
You should find existing papers, group them into categories based on their approaches, and discuss their strengths and weaknesses, as well as how they are similar to and differ from your work. In your opinion, which approaches were clever/good? What is the state-of-the-art? Do most people perform the task by hand? You should aim to have at least 5 references in the related work. Include previous attempts by others at your problem, previous technical methods, or previous learning algorithms. Google Scholar is very useful for this (you can click “cite” and it generates MLA, APA, BibTeX, etc.).
MATERIALS AND METHODS (1.5 − 2.5 pages)
You should have the following two subsections:
Dataset and Features (≈0.5 − 1 pages)
Describe your dataset: how many training/validation/test examples do you have? Is there any preprocessing you did? What about normalization or data augmentation? What is the resolution of your images? How is your time-series data discretized? Include a citation on where you obtained your dataset from. Depending on available space, show some examples from your dataset. You should also talk about the features you used. If you extracted features using Fourier transforms, word2vec, histogram of oriented gradients (HOG), PCA, ICA, etc. make sure to talk about it. Try to include examples of your data in the report (e.g. include an image, show a waveform, etc.).
Methods (≈1 − 1.5 pages)
Formally describe your learning algorithms or proposed algorithm(s). Make sure to include relevant mathematical notation. For example, you can briefly include the formula for an RNN or say what the softmax function is. It is okay to use formulas from lecture notes. For each algorithm, give a short description (≈1 paragraph) of how it works. Again, we are looking for your understanding of how these machine learning algorithms work. Additionally, if you are using a niche or cutting-edge algorithm, you may want to explain your algorithm using 1/2 paragraph.
You should also give details about what (hyper)parameters you chose (e.g. why did you use X learning rate for gradient descent, what was your mini-batch size and why) and how you chose them. How did you split your dataset? Before you list your results, make sure to list and explain what your primary metrics are: accuracy, precision, AUC, etc. Provide equations for the metrics if necessary. For results, you want to have a mixture of tables and plots. If you are solving a classification problem, you should include a confusion matrix or AUC/AUPRC curves. Include performance metrics such as precision, recall, and accuracy. For regression problems, state the average error. Some of the experimental setups can also be summarized as a table to make it clearer for the reader to follow along.
PRELIMINARY RESULTS AND NEXT STEPS (≈0.5 − 1 pages)
- Describe the experiments that you’ve run, the outcomes, and any error analysis that you’ve done.
- You should have tried at least one baseline model to compare your model implementation and architecture.
- Given your preliminary results, what are the next steps that you’re considering?
Submission
You will submit the above requirements as a Word document (.docx) written using the NIPS word template. Also, you will also submit a copy of the document as a PDF.
You will submit both your Word and PDF documents as a group through Blackboard.
Grading
The midpoint report is worth 15 points:
- Introduction (3 points)
- Related Work (2 points)
- Materials and Method (5 points)
- Preliminary Results and Next Steps (3 points)
- Report Clarity and Quality (2 points)
Examples
Midpoint Report Examples (2021 Spring)
Final Poster
Due Date: Thu., Apr. 28 @ 11:59 pm
You will summarize your project as a poster presentation that other students can to view from Blackboard. This will give everyone an opportunity to see what other students did for their projects.
Deliverable
Your poster will be in “digital format”, which means that you will use the PowerPoint file provided to you on the assignment, as a template for making a poster. The PowerPoint poster is optimized to be viewed on a digital screen rather than be printed as a physical poster. Your poster should have the following sections:
Title | Your project title |
Team | Include your names and student emails |
Motivation | Briefly explain the motivation for your topic, what you built, and the results. It’s easier to think of this as a quick summary of the inputs and outputs. (5 sentences max) |
Data | Exactly where did your data come from and what does your contain? (ie. What are in the rows and columns? Are examples labeled with ground truth? If you have images, are they color, normalized, etc?) (2-3 sentences max) |
Features | How many features do you have and which features are the raw input data (ex. images, text, and etc) vs. features learned (ex. from CNN, GAN, etc)? Why are they are appropriate for this task? (3-4 sentences max) |
Models | Exactly which model(s) are you using? Write out the basic math formulas and clearly note any modifications or additions. If you have more than one model, make subsections for each. (3-4 sentences max) |
Results | Make a compact table of results. Each row should be a different model. The columns should be the training error and the test error. List how many samples are in each of the training and testing data sets. Obviously, these sets should be different. (1-2 sentences max + 1 table max) |
Discussion | This is where you share your thoughts about your project. (Hopefully you have a few interesting interpretations!) Briefly summarize what just happened. Briefly explain whether or not you expected your results. If your results were good, explain why. If they were not good, explain why. (6 sentences max) |
Future | If you had another 6 months to work on this, what would you do first? (2-3 sentences max) |
References | For example, IEEE style is suitable. |
Also take a look at tips for posters design from Stanford.
Submission
Please submit a PowerPoint file using the provided template.
You will submit your presentation as a group through Blackboard.
Grading
Your posters are worth 30 points and will be graded on quality, clarity, and the technical content of the poster. Make sure that other students will be able to understand your experiments and results just by looking at the poster.
Examples
Final Poster Examples (2021 Spring)
Final Report
Due Date: Tue., May. 3 @ 11:59 pm
You will continue and finish your final report that you started for the midpoint check-in. Essentially, you will wrap-up your results and append a discussion section to your midpoint report. Furthermore, you will submit your code that can reproduce the processing you did for this project, since an important aspect of deep learning and machine learning research is reproducible code as this allows other researches to build upon your work.
Deliverable
The following are the expectations of your final report and code for your project.
Report
Your report will be 8 pages long including figures and tables, but excluding references, which will be a separate page. Your report needs to be in the NIPS format and should include the following:
ABSTRACT (200 – 250 words)
It should consist of 1 paragraph of ~200 – 250 words consisting of the motivation for your paper, a high-level explanation of the methodology you used, a brief report of your results, and main conclusions.
INTRODUCTION (0.5 – 0.75 pages)
This is an expansion of your Background and Motivation from your proposal
Explain the problem and why it is important. Discuss your motivation for pursuing this problem. Give some background if necessary. Clearly state what the input and output is to your model. Be very explicit: “The input to our algorithm is an {image, amplitude, patient age, rainfall measurements, grayscale video, etc.}. We then use a {neural network, linear regression, etc.} to output a predicted {age, stock price, cancer type, music genre, etc.}.” This is very important since different teams have different inputs/outputs spanning different application domains. Being explicit about this makes it easier for readers. Also include a figure demonstrating the overall work-flow of your overall idea. The idea is to make your paper more accessible, especially to readers who start by skimming your paper.
RELATED WORK (≈ 0.5 page)
You should find existing papers, group them into categories based on their approaches, and discuss their strengths and weaknesses, as well as how they are similar to and differ from your work. In your opinion, which approaches were clever/good? What is the state-of-the-art? Do most people perform the task by hand? You should aim to have at least 5 references in the related work. Include previous attempts by others at your problem, previous technical methods, or previous learning algorithms. Google Scholar is very useful for this (you can click “cite” and it generates MLA, APA, BibTeX, etc.).
MATERIALS AND METHODS (1.5 − 2.5 pages)
You should have the following two subsections:
Dataset and Features (≈0.5 − 1 pages)
Describe your dataset: how many training/validation/test examples do you have? Is there any preprocessing you did? What about normalization or data augmentation? What is the resolution of your images? How is your time-series data discretized? Include a citation on where you obtained your dataset from. Depending on available space, show some examples from your dataset. You should also talk about the features you used. If you extracted features using Fourier transforms, word2vec, histogram of oriented gradients (HOG), PCA, ICA, etc. make sure to talk about it. Try to include examples of your data in the report (e.g. include an image, show a waveform, etc.).
Methods (≈1 − 1.5 pages)
Formally describe your learning algorithms or proposed algorithm(s). Make sure to include relevant mathematical notation. For example, you can briefly include the formula for an RNN or say what the softmax function is. It is okay to use formulas from lecture notes. For each algorithm, give a short description (≈1 paragraph) of how it works. Again, we are looking for your understanding of how these machine learning algorithms work. Additionally, if you are using a niche or cutting-edge algorithm, you may want to explain your algorithm using 1/2 paragraph.
You should also give details about what (hyper)parameters you chose (e.g. why did you use X learning rate for gradient descent, what was your mini-batch size and why) and how you chose them. How did you split your dataset? Before you list your results, make sure to list and explain what your primary metrics are: accuracy, precision, AUC, etc. Provide equations for the metrics if necessary. For results, you want to have a mixture of tables and plots. If you are solving a classification problem, you should include a confusion matrix or AUC/AUPRC curves. Include performance metrics such as precision, recall, and accuracy. For regression problems, state the average error. Some of the experimental setups can also be summarized as a table to make it clearer for the reader to follow along.
RESULTS/DISCUSSION (1 − 3 pages)
You may either have separate Results and Discussion sections or you may combine the sections into one larger section. It is up to you, how you choose to explain the implications of your Results. It is important for you to highlight any nuances in your findings that are shown in your Results.
Overall, you should have both quantitative measures of your results and qualitative explanations of your results. It would be good to have an experimental comparison of the results of your method compared with baseline methods. Include visualizations of results, heatmaps, examples of where your algorithm failed and a discussion of why certain algorithms failed or succeeded. In addition, explain whether you think you have overfitting (or any other limitations) to your training set and what, if anything, you did to mitigate that. Make sure to discuss the figures/tables in your main text throughout this section. Your plots should include legends, axis labels, and have font sizes that are legible when printed.
CONCLUSIONS/FUTURE WORK (1 − 2 paragraphs)
Summarize your report and reiterate key points. Which algorithms were the highest performing? Why do you think that some algorithms worked better than others? For future work, if you had more time, more team members, or more computational resources, what would you explore?
REFERENCES (1 page limit outside of the 8 page limit for the report)
This section should include citations for: (1) Any papers mentioned in the related work section. (2) Papers describing algorithms that you used which were not covered in class. (3) Code or libraries you downloaded and used. This includes libraries such as scikit-learn, Matlab toolboxes, Tensorflow, PyTorch, etc. Acceptable formats include: MLA, APA, IEEE. If you do not use one of these formats, each reference entry must include the following (preferably in this order): author(s), title, conference/journal, publisher, year.
CONTRIBUTIONS (outside of the 8 page limit for the report)
This section should describe what each team member worked on and contributed to the project. Simply append this to the end of the References section.
Code
Your project and results should be reproducible. There might by small differences in results obtained due to randomization and different hardware systems, but your code should run and provide similar results. Please either provide, a zip file that includes the code and data, a link to a GitHub repository, or a Python Notebook that will download your data and produce the results in your final project.
Submission
You will submit a Word document (.docx) and PDF written using the NIPS template (provided in the midterm assignment posting) and also provided here.
You will submit both your Word and PDF files as a group through Blackboard.
You will also submit your code either as a zip file or a link according to the above specifications.
Grading
Your final report is worth 40 points and will be graded on quality, clarity, and the technical content. The following is the grading breakdown:
Report
- Abstract (1 point)
- Introduction (11 points)
- Motivation (4 points)
- Figure or diagram (4 points)
- Related work section (3 points)
- Materials and Methods (5 points)
- Data Description (2 points)
- Methods and Experimental Setup (3 points)
- Results/Discussion (15 points)
- Demonstration of results and/or comparison (8 points)
- Figures and/or tables are important
- Implications of results (5 points)
- Limitations (2 points)
- Demonstration of results and/or comparison (8 points)
- Conclusions (1 points)
- References (1 points)
Code
- Working Code (2 points)
Technical Quality
- Points will be given for going above and beyond. We are looking for some form of creativity, e.g. clever experiments that reveal an interesting phenomenon, tricks for circumventing obstacles, etc. (5 points)
Examples
Final Report Examples (2021 Spring)