Project

Overiew of Course Project
Project Proposal
Midpoint Presentation
Project Check-in
Final Presentation

Overiew of Course Project

The subfield of deep learning within the field of machine learning is a widely used technique, in many modern technologies such as in driverless cars and video captioning. Therefore, the main goal of the course project is for you to apply deep learning techniques to “real” problems. As this will prepare you to begin a career in deep learning and machine learning. Or at least help you navigate any challenges you may encounter in modern data and statistical analysis.

Background Information

The first step of the project is to choose a research topic for the project proposal. There are three types of projects:

Application project: This is the most common type of project. Apply an existing deep learning algorithm to a novel problem or apply the algorithm in new way to the problem. In any case, the application will be novel.
Algorithmic project: Develop a new learning algorithm or a model architecture and show its application on a dataset.
Theoretical project: Analyze and prove an interesting/non-trivial property of a new or an existing learning algorithm. This is a very difficult and is rare project to implement. So please avoid this option, as it will most likely be unsuccessful.

Most projects will be a combination of the first two types. Also, replicating results in a paper can be a good way to learn. However, if you replicate a paper, you also need to use the technique on another application, do some analysis of how each component of the model contributes to final performance.

When choosing your project topic, make sure that you will be able to implement the project within the duration of the course. It is expected that your final presentation and report will have results obtained from your deep learning models. As this will provide a large contribution to your final grade on the project.

Your final written report for the project will be close to publication quality for a conference or journal. Meaning that with some refinement and additional work, your course project can be submitted to a conference or journal. Also, you may be able to use the course project as the beginning of a thesis project.

How to Find Project Ideas

The best way to find project topics is to look at previous work.

For project ideas, please check the List of Projects at the bottom of the page of the following link:

Berkeley Stat 157

You can also check out Past Projects from Stanford CS230 at the following link:

CS230 - Past Projects Some of the projects shown here are not deep learning, but rather machine learning. However, there are many deep learning examples as well.

The two main conferences for deep learning are ICML and NIPS. The published work from both conferences can be found from:

NIPS (also known as NeurIPS)
ICML: simply change the year in the address bar to go to previous conferences

Even better, you can find all the published articles from those two conferences and other relevant conferences all in one location from. Various conferences are listed by year:

Also, You can check for background information and relevant research about your topic using an academic search engine such as:

Google Scholar

For your project, you will need to consider what dataset you will work on and how you will obtain that dataset. If the dataset needs significant preprocessing or you aim to collect the data yourself, be aware of how much time will be devoted to this aspect of the project. Since, other aspects of the project, such as the actual implementation of a deep learning method on the dataset, will still need to be completed.

You are encouraeged to collect your own data.

However, if you are having trouble, you can use data from precurated sources. You can obtain prepared and somewhat preprocessed datasets from sources such as:

Kaggle: Kaggle also provides free Jupyter Notebook environments for data processing.
UCI machine learning repository: There are some good datasets here too. They tend to come from published articles.

The topic of your project can be from areas such as the following:

Athletics & Sensing Devices
Audio & Music
Computer Vision
Finance & Commerce
General Machine Learning
Life Sciences
Natural Language
Physical Sciences
Theory & Reinforcement Learning

Project Evaluation

The project is divided into four parts for a total of 50 points:

Proposal (5 pts)
Midpoint Presentation (10 pts)
Project Check-in (5 pts)
Final Presentation (30 pts)

The project will overall be evaluated on:

Technical quality of the work
- Does the technical material make sense?
- Are the experiments in logical and reasonable manner?
- Are the proposed algorithms or applications clever and interesting?
- Do the authors convey novel insight about the problem and/or algorithms?
Significance
- Did the authors choose an interesting or a “real” problem to work on, or only a small “toy” problem?
- Is this work likely to be useful and/or have impact?
Novelty of the work
- Is the project applying a common technique to a well-studied problem, or is the problem or method relatively unexplored?

To demonstrate the novelty and effectiveness of your project, you need to clearly indicate the importance of your topic, the improvement you are implementing, and how it compares to previous work.

Project Proposal

Due Date: Thu., Feb. 8 @ 9:00 am

The goal of the project proposal is for you to begin work on a project and to receive feedback. You are not allowed to do joint projects with other classes.

Deliverable

In a 5-minute presentation, please provide the following information:

Background and Motivation:
- Provide some context for the problem you are trying to solve.
- Is this an application or a theoretical result?
- What is the previous research?
Data:
- What is the dataset you plan to work on?
- What type of features does it have?
Method:
- What machine learning techniques are you planning to apply?
Intended experiments:
- What aspects of the data will you use?
- What experiments are you planning to run?
- How do you plan to evaluate your machine learning algorithm?

This will be followed by a 5 minute question period and will count as part of your presentation, for a total of 10 minutes. The description and analysis of the problem should persuade the audience that it is worthwhile problem to study.

Submission

You will submit your Powerpoint presentation file as a group through Blackboard.

Grading

The presentation is worth 5 points:

Motivation & Research Aim (1 point)
Data & Features (1 point)
Method & Experiments (1 point)
Q & A (1 point)
Slide Quality (0.25 point)
Presentation Quality (0.5 point)
References (0.25 point)

Detailed Proposal Presentation Grading Rubric

Examples

Although the examples are from Machine Learning, the expectations are very similar.

Midpoint Presentation

Due Date: Tue., Mar. 26 @ 9:00 am

Deliverable

You will create a presentation that summarizes your midpoint report. Your presentation should be 12 minutes in length, which means you should have 12 - 16 slides. Followed by 8 minutes of a question period.

The midpoint presentation will help keep your project on track. It needs to describe what you’ve accomplished thus far and briefly state your next experiments and steps. It should be presented as an “early draft” of what ultimately will be your final presentation. The goal is to prepare for your final presentation and will be able to reuse many of the components you present here.

When preparing your midpoint presentation, be mindful that the intended audience are individuals who would understand machine learning. As a result, you should not spend 5 minutes explaining how a support vector machine classifier works. Rather you should summarize the main concept behind the algorithm, and focus more on the reasoning behind your experiments and the implications of your results. Also, make sure that you include sufficient related works in your introductory slides.

Deliverable

You will present the following specifications.

Your presentation should be 12 minutes in duration and should discuss the following:

MOTIVATION & RESEARCH AIM (1-2 minutes)
Explain the problem and why it is important. Discuss your motivation for pursuing this problem. Give some background if necessary. Clearly state what the input and output is to your model. Be very explicit: “The input to our algorithm is an {image, amplitude, patient age, rainfall measurements, grayscale video, etc.}. We then use a {neural network, linear regression, etc.} to output a predicted {age, stock price, cancer type, music genre, etc.}.” This is very important since different teams have different inputs/outputs spanning different application domains. Being explicit about this makes it easier for readers. Also include a figure demonstrating the overall work-flow of your overall idea.
RELATED WORK (1-2 minutes)
You should find existing papers, group them into categories based on their approaches, and discuss their strengths and weaknesses, as well as how they are similar to and differ from your work. In your opinion, which approaches were clever/good? What is the state-of-the-art? Do most people perform the task by hand? You should aim to have at least 5 references in the related work. Include previous attempts by others at your problem, previous technical methods, or previous learning algorithms. Google Scholar is very useful for this (you can click “cite” and it generates MLA, APA, BibTeX, etc.) and you copy the references on your last slide.
MATERIALS AND METHODS (3-4 minutes)
You should have the following two subsections:
Dataset and Features
Describe your dataset: how many training/validation/test examples do you have? Is there any preprocessing you did? What about normalization or data augmentation? What is the resolution of your images? How is your time-series data discretized? Include a citation on where you obtained your dataset from. Depending on available space, show some examples from your dataset. You should also talk about the features you used. If you extracted features using Fourier transforms, word2vec, histogram of oriented gradients (HOG), PCA, ICA, etc. make sure to talk about it. Try to include examples of your data in the presentation (e.g. include an image, show a waveform, etc.).
Methods
Formally describe your learning algorithms or proposed algorithm(s). Make sure to include relevant mathematical notation. For example, you can briefly include the formula for an RNN or say what the softmax function is. It is okay to use formulas from lecture notes. For each algorithm, give a short description (a few bullet points) of how it works. Again, we are looking for your understanding of how these machine learning algorithms work. Additionally, if you are using a niche or cutting-edge algorithm, you may want to explain your algorithm a bit more.
You should also give details about what (hyper)parameters you chose (e.g. why did you use X learning rate for gradient descent, what was your mini-batch size and why) and how you chose them. How did you split your dataset? Before you list your results, make sure to list and explain what your primary metrics are: accuracy, precision, AUC, etc. Provide equations for the metrics if necessary. For results, you want to have a mixture of tables and plots. If you are solving a classification problem, you should include a confusion matrix or AUC/AUPRC curves. Include performance metrics such as precision, recall, and accuracy. For regression problems, state the average error. Some of the experimental setups can also be summarized as a table to make it clearer for the reader to follow along.
PRELIMINARY RESULTS AND NEXT STEPS (5-6 minutes)
- Describe the experiments that you’ve run, the outcomes, and any error analysis that you’ve done.
- You should have tried at least one baseline model to compare your model implementation and architecture.
- Given your preliminary results, what are the next steps that you’re considering?

Submission

You will submit your presentation as a PowerPoint(.pptx) and PDF file as group through Blackboard.

Grading

The midpoint presentaiton is worth 10 points:

Motivation & Research Aim (0.5 point)
Related Work (0.5 point)
Data & Features (1 point)
Materials and Method (2 point)
Preliminary Results and Next Steps (3 points)
Q & A (2 point)
Slide Quality (0.25 point)
Presentation Quality (0.5 point)
References (0.25 point)

Detailed Midpoint Presentation Rubric

Examples

Although the examples are from Machine Learning, the expectations are very similar.

Project Check-in

Due Date: Tue., Apr. 9 @ 9:00 am

You will provide a 3 min update to your project. This is to ensure that your group is on track to finish the project.

Deliverable

You may either have a 1-3 slide presentation or just verbally convey the updates.

Submission

Either submit a copy of your presentation or indicate that you will do a verbal update in the submission window.

Grading

The check-in is worth 5 points:

Progress Updates (2.5 points)
Presenation Clarity and Quality (2.5 points)

Detailed Check-in Rubric

Final Presentation

Due Date: Tue., Apr. 23 @ 9:00 am

You will create a presentation that summarizes your project. Your presentation should be 10 minutes in length, which means you should have 10 - 15 slides. Followed by 5 minutes of a question period.

Deliverable

Presentation

You will present the following specifications.

Your presentation should be 10 minutes in duration and should discuss the following:

MOTIVATION & RESEARCH AIM (30 sec - 1 minutes)
Explain the problem and why it is important. Discuss your motivation for pursuing this problem. Give some background if necessary. Clearly state what the input and output is to your model. Be very explicit: “The input to our algorithm is an {image, amplitude, patient age, rainfall measurements, grayscale video, etc.}. We then use a {neural network, linear regression, etc.} to output a predicted {age, stock price, cancer type, music genre, etc.}.” This is very important since different teams have different inputs/outputs spanning different application domains. Being explicit about this makes it easier for readers. Also include a figure demonstrating the overall work-flow of your overall idea.
RELATED WORK (30 sec - 1 minutes)
You should find existing papers, group them into categories based on their approaches, and discuss their strengths and weaknesses, as well as how they are similar to and differ from your work. In your opinion, which approaches were clever/good? What is the state-of-the-art? Do most people perform the task by hand? You should aim to have at least 5 references in the related work. Include previous attempts by others at your problem, previous technical methods, or previous learning algorithms. Google Scholar is very useful for this (you can click “cite” and it generates MLA, APA, BibTeX, etc.) and you copy the references on your last slide.
MATERIALS AND METHODS (3-4 minutes)
You should have the following two subsections:
Dataset and Features
Describe your dataset: how many training/validation/test examples do you have? Is there any preprocessing you did? What about normalization or data augmentation? What is the resolution of your images? How is your time-series data discretized? Include a citation on where you obtained your dataset from. Depending on available space, show some examples from your dataset. You should also talk about the features you used. If you extracted features using Fourier transforms, word2vec, histogram of oriented gradients (HOG), PCA, ICA, etc. make sure to talk about it. Try to include examples of your data in the presentation (e.g. include an image, show a waveform, etc.).
Methods
Formally describe your learning algorithms or proposed algorithm(s). Make sure to include relevant mathematical notation. For example, you can briefly include the formula for an RNN or say what the softmax function is. It is okay to use formulas from lecture notes. For each algorithm, give a short description (a few bullet points) of how it works. Again, we are looking for your understanding of how these machine learning algorithms work. Additionally, if you are using a niche or cutting-edge algorithm, you may want to explain your algorithm a bit more.
You should also give details about what (hyper)parameters you chose (e.g. why did you use X learning rate for gradient descent, what was your mini-batch size and why) and how you chose them. How did you split your dataset? Before you list your results, make sure to list and explain what your primary metrics are: accuracy, precision, AUC, etc. Provide equations for the metrics if necessary. For results, you want to have a mixture of tables and plots. If you are solving a classification problem, you should include a confusion matrix or AUC/AUPRC curves. Include performance metrics such as precision, recall, and accuracy. For regression problems, state the average error. Some of the experimental setups can also be summarized as a table to make it clearer for the reader to follow along.
Results & Discussion (5-6 minutes)
It is important for you to highlight any nuances in your findings. Make tables of your results. Overall, you should have both quantitative measures of your results and qualitative explanations of your results. It would be good to have an experimental comparison of the results of your method compared with baseline methods. Include visualizations of results, heatmaps, examples of where your algorithm failed and a discussion of why certain algorithms failed or succeeded. In addition, explain whether you think you have overfitting (or any other limitations) to your training set and what, if anything, you did to mitigate that. Make sure to discuss the figures/tables in your main text throughout this section. Your plots should include legends, axis labels, and have font sizes that are legible when printed.
This is where you share your thoughts about your project. (Hopefully you have a few interesting interpretations!) Briefly summarize what just happened. Briefly explain whether or not you expected your results. If your results were good, explain why. If they were not good, explain why.

Code

Your project and results should be reproducible. There might by small differences in results obtained due to randomization and different hardware systems, but your code should run and provide similar results. Please either provide, a zip file that includes the code and data, a link to a GitHub repository, or a Python Notebook that will download your data and produce the results in your final project.

Submission

You will submit on Blackboard:

your presentation as a PowerPoint(.pptx) and PDF file as group through
your code either as a zip file or a link according to the above specifications
the contributions of each team member to the project

Grading

The final presentation is worth 30 points:

Presentation

Motivation & Research Aim (0.25 point)
Related Work (0.5 point)
Data & Features (2 point)
Materials and Method (3 point)
Results (6 points)
Discussion (5 points)
- Implications (3 points)
- Limitations (2 points)
Q & A (3 point)
Slide Quality (2 point)
Presentation Quality (2 point)
References (0.25 point)

Code

Working Code (1 points)

Technical Quality

Points will be given for going above and beyond. We are looking for some form of creativity, e.g. clever experiments that reveal an interesting phenomenon, tricks for circumventing obstacles, etc. (5 points)

Detailed Final Presentation Rubric

Examples

Although the examples are from Machine Learning, the expectations are very similar.

Project

Table of contents

Overiew of Course Project

Background Information

How to Find Project Ideas

Project Evaluation

Project Proposal

Deliverable

Submission

Grading

Examples

Midpoint Presentation

Deliverable

Deliverable

Submission

Grading

Examples

Project Check-in

Deliverable

Submission

Grading

Final Presentation

Deliverable

Presentation

Code

Submission

Grading

Examples