Learners will gain the skills to serve powerful language models as practical and scalable web APIs. They will learn how to use the llama.cpp example server to expose a large language model through a set of REST API endpoints for tasks like text generation, tokenization, and embedding extraction.



Beginning Llamafile for Local Large Language Models (LLMs)


Instructors: Noah Gift
Access provided by Coursera Learning Team
Recommended experience
What you'll learn
Learn how to serve large language models as production-ready web APIs using the llama.cpp framework
Understand the architecture and capabilities of the llama.cpp example server for text generation, tokenization, and embedding extraction
Gain hands-on experience in configuring and customizing the server using command line options and API parameters
Skills you'll gain
Details to know

Add to your LinkedIn profile
4 assignments
See how employees at top companies are mastering in-demand skills


Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

There is 1 module in this course
This week, you run language models locally. Keep data private. Avoid latency and fees. Use Mixtral model and llamafile.
What's included
8 videos18 readings4 assignments1 discussion prompt4 ungraded labs
Offered by
Why people choose Coursera for their career





Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy