• Class Number 9199
  • Term Code 3460
  • Class Info
  • Unit Value 6 units
  • Mode of Delivery In Person
  • COURSE CONVENER
    • Prof Hanna Suominen
    • Prof Jing Jiang
  • Class Dates
  • Class Start Date 22/07/2024
  • Class End Date 25/10/2024
  • Census Date 31/08/2024
  • Last Date to Enrol 29/07/2024
SELT Survey Results

This course considers the “document” and its various genres as a fundamental object for business, government and community, such as web pages, social media feeds, news items, and PDF brochures. The goal is to introduce concepts and hands-on tools for automated understanding of large amounts of text. For this, the course covers four broad areas: (A) information retrieval, (B) natural language processing, © machine learning for documents, and (D) relevant tools for the web. Tasks include content collection and extraction, formal and informal natural language processing, information extraction, information retrieval, classification and analysis. Fundamental probabilistic techniques for performing these tasks, and some common software systems will be covered, though no area will be covered in great depth.

Learning Outcomes

Upon successful completion, students will have the knowledge and skills to:

  1. differentiate between the basic probabilistic theories of language and document structure, information retrieval, and classification, clustering and document feature engineering.
  2. identify the basic algorithms and software available for probabilistic theories of language and be proficient at using common libraries for natural language processing to perform basic analysis tasks.
  3. index a document collection for use in an information retrieval system. Demonstrate advanced knowledge of basic theories and algorithms to determine large scale named-entity matching and standardization of names within a collection.
  4. perform automated classification using probabilistic theories.

Research-Led Teaching

The teaching and learning activities below are founded on state-of-the-art research outcomes and currently ongoing research projects. For example, document analysis methods and applications are considered through the strategic Our Health In Our Hands (OHIOH) initiative of The ANU, created in partnership with ACT Health.

Required Resources

A laptop or desktop with a reliable internet connection is required for accessing the course material on Wattle and for completing the practicals and assignments. Python and Jupyter Notebook will be used extensively in this course so being able to install freely available software will be necessary. An alternative is to have access to a laptop or desktop where appropriate software is already installed.

Further details on software used, and instructions, can be found on the Wattle site for the course.

The following textbooks will help you to better understand the course material and broaden your understanding. They are both provided online for free by the authors and are highly recommended. The first book covers IR in a very approachable way, but it goes into much more depth than we will cover in this course. The second book is a currently evolving new edition of one of the best NLP textbooks. This book is up to date with the latest approaches and covers many topics in much greater depth than we will cover in this course.

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press. 2008.

Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft). 2023.

Staff Feedback

Students will be given feedback in the following forms: 

  • Written marks and feedback for the online quizzes on Wattle.
  • Written marks and feedback on the assignments.
  • Written marks and feedback for the final exam.
  • Casual verbal feedback and comments during lectures and computing labs, supported by discussion forum posts.


Marks and feedback for assessment items, with the exception of the final exam, are retuned to students within 2 weeks from the respective submission dates, using Wattle grade books and/or Wattle quizzes. If you have questions related to your mark, you must contact Hanna Suominen within 14 days from the mark release. In accordance with the ANU examination policies, final exam marks are released only after the final marks for the entire unit have been released by the ANU.


Note that consistent scaling for each of the units may occur with the final marks. Students must get a minimum final overall mark of at least 50/100 (50%) to pass the subject. Final marks are moderated by the examiners' meeting at the School of Computing. Supplementary assessment will be awarded to those students with an overall unit mark of between 45 and 49.

Student Feedback

ANU is committed to the demonstration of educational excellence and regularly seeks feedback from students. Students are encouraged to offer feedback directly to their Course Convener or through their College and Course representatives (if applicable). The feedback given in these surveys is anonymous and provides the Colleges, University Education Committee and Academic Board with opportunities to recognise excellent teaching, and opportunities for improvement. The Surveys and Evaluation website provides more information on student surveys at ANU and reports on the feedback provided on ANU courses.

Other Information

Gen AI Tools are ALLOWED: 

The use of Generative AI Tools (e.g., ChatGPT) is permitted in this course, given that proper citation and prompts are provided, along with a description of how the tool contributed to the assignment. Guidelines regarding appropriate citation and use can be found on the ANU library website at https://libguides.anu.edu.au/generative-ai . Marks will reflect the contribution of the student rather than the contribution of the tools. Further guidance on appropriate use should be directed to the convener for this course.

Class Schedule

Week/Session Summary of Activities Assessment
1 Course Introduction & LogisticsOverview of Natural Language Processing (NLP) Tasks Wattle Quiz 0 (ungraded, open book)
2 Boolean RetrievalRanked RetrievalComputing Lab 1 Assignment 1 Specification Released on Wattle
3 Evaluation of Information Retrieval (IR) SystemsWeb Search BasicsComputing Lab 2 Wattle Quiz 1 (open book)
4 Machine Learning (ML) Basics I and II Assignment 1 (open book) Submission due on WattleAssignment 2 Specification Released on Wattle
5 Representation in NLPClusteringComputing Lab 3
6 Deep Neural Networks I and IIComputing Lab 4 Assignment 1 Mark Released on Wattle
7 Seq2Seq and AttentionTransformers
8 Pre-trained Language Models I and IIComputing Lab 5 Wattle Quiz 2 (open book)
9 Language Modelling I and IIComputing Lab 6
10 Syntactic ParsingSemantics Assignment 2 (open book) Submission due on Wattle
11 Evaluation in NLPMultilingual and Low Resource NLP(Option for a Drop-in Computing Lab, depending on the need) Wattle Quiz 3 (open book)
12 Course Review(Option for a Drop-in Lecture, depending on the need) Assignment 2 Mark Released on Wattle
13 Examination Period Final Exam (3 hours, in-person)

Tutorial Registration

ANU utilises MyTimetable to enable students to view the timetable for their enrolled courses, browse, then self-allocate to small teaching activities / tutorials so they can better plan their time. Find out more on the Timetable webpage (https://www.anu.edu.au/students/program-administration/timetabling) and register for computing labs.

Assessment Summary

Assessment task Value Learning Outcomes
Online Quizzes (from 0 to 10 marks, individual assessment item, open book, on Wattle) 10 % 1,2,3,4,5,6
Assignment 1 (from 0 to 10 marks, individual assessment item, open book, on Wattle) 10 % 1,2,3,4
Assignment 2 (from 0 to 25 marks, individual assessment item, open book, on Wattle) 25 % 1,2,3,5,6
Final Exam (from 0 to 55 marks, individual assessment item) 55 % 1,2,3,4,5,6

* If the Due Date and Return of Assessment date are blank, see the Assessment Tab for specific Assessment Task details

Policies

ANU has educational policies, procedures and guidelines, which are designed to ensure that staff and students are aware of the University’s academic standards, and implement them. Students are expected to have read the Academic Misconduct Rule before the commencement of their course. Other key policies and guidelines include:

Assessment Requirements

The ANU is using Turnitin to enhance student citation and referencing techniques, and to assess assignment submissions as a component of the University's approach to managing Academic Integrity. For additional information regarding Turnitin please visit the ANU Online website Students may choose not to submit assessment items through Turnitin. In this instance you will be required to submit, alongside the assessment item itself, hard copies of all references included in the assessment item.

Moderation of Assessment

Marks that are allocated during Semester are to be considered provisional until formalised by the College examiners meeting at the end of each Semester. If appropriate, some moderation of marks might be applied prior to final results being released.

Examination(s)

The final examination will be a three hour on-campus invigilated exam. Centrally scheduled examinations through Examinations, Graduations & Prizes will be timetabled prior to the examination period. Please check the details above and ANU Timetabling for further information.

Assessment Task 1

Value: 10 %
Learning Outcomes: 1,2,3,4,5,6

Online Quizzes (from 0 to 10 marks, individual assessment item, open book, on Wattle)

Three online Wattle quizzes will be offered. Marks for quizzes 1, 2, and 3 will be scaled to contribute 3%, 4%, and 3% to the overall course mark, respectively. Good understanding of all of the relevant learning objectives will be typically marked as 7/10 (70%).

Only one attempt is permitted for each quiz. Automated feedback on correct answers is given once the due date is passed and all (pre-arranged) late submissions are collected. Submission deadlines are provided on Wattle.

No late submissions without a pre-arranged extension.


Wattle Quiz 1 is particularly designed as a Pre-Census task in order to provide students with feedback prior to census deadline.


In addition to these three quizzes, Wattle Quiz 0 (ungraded, open book) will be offered to

1) help students familiarise themselves with the wattle quiz functionality before the graded Wattle Quizzes 1-3,

2) improve students' self-understanding of their own starting point and study needs, and

3) teaching staff to gain understanding of students ability prior to the start to support further tailoring the 2024 course to its students.

Assessment Task 2

Value: 10 %
Learning Outcomes: 1,2,3,4

Assignment 1 (from 0 to 10 marks, individual assessment item, open book, on Wattle)

A programming assignment that also requires you to provide written answers to questions. Covers the Information Retrieval material in the course. Details provided through Wattle. Submission is through Wattle. Submission deadline is provided on Wattle. Good understanding of all of the relevant learning objectives will be typically marked as 7/10 (70%).

No late submissions without a pre-arranged extension.

Assessment Task 3

Value: 25 %
Learning Outcomes: 1,2,3,5,6

Assignment 2 (from 0 to 25 marks, individual assessment item, open book, on Wattle)

A programming assignment that also requires you to provide written answers to questions. Covers the Machine Learning and Natural Language Processing material in the course. Details provided through Wattle. Submission is through Wattle. Submission deadline provided on Wattle. Good understanding of all of the relevant learning objectives will be typically marked as 17.5/25 (70%).

No late submissions without a pre-arranged extension.

Assessment Task 4

Value: 55 %
Learning Outcomes: 1,2,3,4,5,6

Final Exam (from 0 to 55 marks, individual assessment item)

The Final exam will be a closed-book exam, scheduled by the ANU Examinations Office as an in-person individual examination. Detailed information will be provided via the Wattle course site. Good understanding of all learning objectives will be typically marked as 38.5/55 (70%).

No late submissions without a pre-arranged extension.

Academic Integrity

Academic integrity is a core part of our culture as a community of scholars. At its heart, academic integrity is about behaving ethically. This means that all members of the community commit to honest and responsible scholarly practice and to upholding these values with respect and fairness. The Australian National University commits to embedding the values of academic integrity in our teaching and learning. We ensure that all members of our community understand how to engage in academic work in ways that are consistent with, and actively support academic integrity. The ANU expects staff and students to uphold high standards of academic integrity and act ethically and honestly, to ensure the quality and value of the qualification that you will graduate with. The University has policies and procedures in place to promote academic integrity and manage academic misconduct. Visit the following Academic honesty & plagiarism website for more information about academic integrity and what the ANU considers academic misconduct. The ANU offers a number of services to assist students with their assignments, examinations, and other learning activities. The Academic Skills and Learning Centre offers a number of workshops and seminars that you may find useful for your studies.

Online Submission

You will be required to electronically submit all your quizzes and assignments on Wattle. Submissions of reports and code will be run through Turnitin and/or Moss to assess submissions as an approach to managing Academic Integrity. You will be required to electronically sign a declaration as part of the submission of your assignments. Please keep a copy of the assignments for your records. Unless an exemption has been approved by the Associate Dean (Education) submission must be through Wattle.

Hardcopy Submission

None. All assessment submissions (except the final exam) are electronic through Wattle.

Late Submission

Individual assessment tasks may or may not allow for late submission. Policy regarding late submission is detailed below: Late submission not permitted. If submission of assessment tasks without an extension after the due date is not permitted, a mark of 0 will be awarded.

Referencing Requirements

Accepted academic practice for referencing sources that you use in presentations can be found via the links on the Wattle site, under the file named “ANU and College Policies, Program Information, Student Support Services and Assessment”. Alternatively, you can seek help through the Students Learning Development website.

Returning Assignments

Feedback for assignments will be provided through Wattle.

Extensions and Penalties

Extensions and late submission of assessment pieces are covered by the Student Assessment (Coursework) Policy and Procedure The Course Convener may grant extensions for assessment pieces that are not examinations or take-home examinations. If you need an extension, you must request an extension in writing on or before the due date. If you have documented and appropriate medical evidence that demonstrates you were not able to request an extension on or before the due date, you may be able to request it after the due date.

Resubmission of Assignments

Students will not be permitted to resubmit assignments.

Privacy Notice

The ANU has made a number of third party, online, databases available for students to use. Use of each online database is conditional on student end users first agreeing to the database licensor’s terms of service and/or privacy policy. Students should read these carefully. In some cases student end users will be required to register an account with the database licensor and submit personal information, including their: first name; last name; ANU email address; and other information. In cases where student end users are asked to submit ‘content’ to a database, such as an assignment or short answers, the database licensor may only use the student’s ‘content’ in accordance with the terms of service — including any (copyright) licence the student grants to the database licensor. Any personal information or content a student submits may be stored by the licensor, potentially offshore, and will be used to process the database service in accordance with the licensors terms of service and/or privacy policy. If any student chooses not to agree to the database licensor’s terms of service or privacy policy, the student will not be able to access and use the database. In these circumstances students should contact their lecturer to enquire about alternative arrangements that are available.

Distribution of grades policy

Academic Quality Assurance Committee monitors the performance of students, including attrition, further study and employment rates and grade distribution, and College reports on quality assurance processes for assessment activities, including alignment with national and international disciplinary and interdisciplinary standards, as well as qualification type learning outcomes. Since first semester 1994, ANU uses a grading scale for all courses. This grading scale is used by all academic areas of the University.

Support for students

The University offers students support through several different services. You may contact the services listed below directly or seek advice from your Course Convener, Student Administrators, or your College and Course representatives (if applicable).
Prof Hanna Suominen
+61261253257
COMP4650@anu.edu.au

Research Interests


Deep Learning, Educational Technology, Health Informatics, Machine Learning, Natural Language Processing, Performance Evaluation

Prof Hanna Suominen

By Appointment
Prof Jing Jiang
+61261253257
jing.jiang@anu.edu.au

Research Interests


Deep Learning, Educational Technology, Health Informatics, Machine Learning, Natural Language Processing, Performance Evaluation

Prof Jing Jiang

Sunday

Responsible Officer: Registrar, Student Administration / Page Contact: Website Administrator / Frequently Asked Questions