Python Data Cleaning Cookbook - Second Edition: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Paperback
ISBN13: 9781803239873
Publisher: Packt Publishing
Published: May 31 2024
Pages: 486
Weight: 1.82
Height: 0.98 Width: 7.50 Depth: 9.25
Language: English
Learn the intricacies of data description, issue identification, and practical problem-solving, armed with essential techniques and expert tips.
Key Features:- Get to grips with new techniques for data preprocessing and cleaning for machine learning and NLP models
- Use new and updated AI tools and techniques for data cleaning tasks
- Clean, monitor, and validate large data volumes to diagnose problems using cutting-edge methodologies including Machine learning and AI
Book Description:Jumping into data analysis without proper data cleaning will certainly lead to incorrect results. The Python Data Cleaning Cookbook will show you tools and techniques for cleaning and handling data with Python for better outcomes.
Fully updated to the latest version of Python and all relevant tools, this book will teach you how to manipulate and clean data to get it into a useful form. The current edition emphasizes advanced techniques like machine learning and AI-specific approaches and tools to data cleaning along with the conventional ones. The book also delves into tips and techniques to process and clean data for ML, AI and NLP models You will learn how to filter and summarize data to gain insights and better understand what makes sense and what does not, along with discovering how to operate on data to address the issues you've identified. Next, you'll cover recipes for using supervised learning and Naive Bayes analysis to identify unexpected values and classification errors and generate visualizations for exploratory data analysis (EDA) to identify unexpected values. Finally, you'll build functions and classes that you can reuse without modification when you have new data.
By the end of this Data Cleaning book, you'll know how to clean data and diagnose problems within it.
What You Will Learn:- Using OpenAI tools for various data cleaning tasks
- Produce summaries of the attributes of datasets, columns, and rows
- Anticipating Data Cleaning Issues when Importing Tabular Data into Pandas
- Apply validation techniques for imported tabular data
- Improve your productivity in Python pandas by using method chaining
- Recognize and resolve common issues like dates and IDs
- Set up indexes to streamline data issue identification
- Use data cleaning to prepare your data for ML and AI models
Who this book is for:This book is for anyone looking for ways to handle messy, duplicate, and poor data using different Python tools and techniques. The book takes a recipe-based approach to help you to learn how to clean and manage data with practical examples.
Working knowledge of Python programming is all you need to get the most out of the book.
Also from
Walker, Michael
Freestyle Cookbook: Discover the Best Freestyle Cookbook Recipes For Beginners - Delicious And Healthy Cooking: With Sally P. Bean & Heidi Naquin & Si
Powell, Jelly C.
Walker, Michael
Davis, Allyson S.
Paperback
Python Data Cleaning Cookbook: Modern techniques and Python tools to detect and remove dirty data and extract key insights
Walker, Michael
Paperback
Sylvius Leopold Weiss: 5 Baroque Sonatas from the London Manuscript Arranged For Baritone Ukulele
Walker, Michael
Paperback
Sylvius Leopold Weiss: 5 Baroque Sonatas from the London Manuscript Arranged For Low G Ukulele
Walker, Michael
Paperback
Giuseppe Antonio Brescianello: 18 Partitas for Gallichone Arranged For Baritone Ukulele
Walker, Michael
Paperback
Antoine de L'Hoyer: Grande Sonata and Variations In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Hardcover
Federico Moreno Torroba: Castles of Spain and Puertas de Madrid In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Paperback
Sylvius Leopold Weiss: The Corsaire Suite and L'Infidele Arranged For Baritone Ukulele
Walker, Michael
Paperback
Francesco Molino: Three Sonatas and Six Themes with Variations For Low G Ukulele
Walker, Michael
Paperback
ROMANCE! Compositions from the 19th Century Romantic Movement for Low G Ukulele
Walker, Michael
Paperback
Melchioro de Barberis & Giacomo Gorzanis: Music of the Italian Renaissance For the Low G Ukulele
Walker, Michael
Paperback
The Renaissance Lute: Compositions For Low G Ukulele and Other Four Course Instruments
Walker, Michael
Paperback
Dionisio Aguado: Four Easy Waltzes Opus 7 and Six Petite Pieces Opus 4 In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Paperback
Pietro Paolo Borrono: Songs and Dances From the Renaissance In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Paperback
Johann Anton Logy: Partitas In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Paperback
Mauro Giuliani: Opus 12 - Monferrines & Opus 73 - Bagatelle Per La Chitarra For Low G Ukulele
Walker, Michael
Paperback
Mauro Giuliani: Les Variétés Amusantes Opus 43 and Opus 54 For Low G Ukulele
Walker, Michael
Paperback
Antoine de L'Hoyer: Grande Sonata and Variations In Tablature and Modern Notation For Low G Ukulele
Walker, Michael
Hardcover
Mauro Giuliani 18 Etudes Opus 51 In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Paperback
Mauro Giuliani Studies & Etudes Opus 50, Opus 48 and Selected Pieces In Tablature and Modern Notation For Baritone Ukulele
Walker, Michael
Paperback
Antonio Rotta: Intabolatura de Lauto Lute Music of the Renaissance for Low G Ukulele
Walker, Michael
Paperback
Sylvius Leopold Weiss: 4 More Baroque Sonatas from the London Manuscript Arranged For Baritone Ukulele
Walker, Michael
Paperback
Giuseppe Antonio Brescianello 18 Partitas for Gallichone Arranged For Low G Ukulele
Walker, Michael
Paperback
Sylvius Leopold Weiss: The Corsaire Suite and L'Infidele Arranged For Low G Ukulele
Walker, Michael
Paperback
Francesco Molino: Three Sonatas and Six Themes with Variations For Baritone Ukulele
Walker, Michael
Paperback
Dionisio Aguado: 12 Waltzes Opus 1 and 8 Petite Pieces Opus 3 - For Low G Ukulele
Walker, Michael
Paperback
Dionisio Aguado: Four Easy Waltzes Opus 7 Six Petite Pieces Opus 4 For Low G Ukulele
Walker, Michael
Paperback
Pietro Paolo Borrono: Songs and Dances From the Renaissance For Low G Ukulele
Walker, Michael
Paperback
Domenico Bianchini: Lute Music of the Renaissance Arranged for Low G Ukulele
Walker, Michael
Paperback
Francesco da Milano: Ricercars, Fantasias, and Selected Pieces Volume 5 For Low G Ukulele
Walker, Michael
Paperback
Francesco da Milano: Ricercars, Fantasias, and Selected Pieces Volume 3 For Low G Ukulele
Walker, Michael
Paperback
Francesco da Milano: Ricercars, Fantasias, and Selected Pieces Volume 1 For Low G Ukulele
Walker, Michael
Paperback
Francesco da Milano: Ricercars and Fantasias Volume 3 For Baritone Ukulele and Other Four-Course Instruments
Walker, Michael
Paperback
Francesco da Milano: Volume 5: Ricercars, Fantasias and Selected Pieces Arranged for Baritone Ukulele
Walker, Michael
Paperback
Matteo Carcassi: Twelve Waltzes in Tablature and Modern Notation for Baritone Ukulele
Walker, Michael
Paperback
Ferdinando Carulli: Volume Two Dieciocho Pequeñas Piezas Opus 211 For Low G Ukulele
Walker, Michael
Paperback
Federico Moreno Torroba: Castles of Spain & Puertas de Madrid For Low G Ukulele
Walker, Michael
Paperback
Also in
Databases
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Kleppmann, Martin
Paperback
The Definitive Guide to Dax: Business Intelligence for Microsoft Power Bi, SQL Server Analysis Services, and Excel
Ferrari, Alberto
Russo, Marco
Paperback
Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Housley, Matt
Reis, Joe
Paperback
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Gedeck, Peter
Bruce, Peter
Bruce, Andrew
Paperback
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Kimball, Ralph
Ross, Margy
Paperback
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Frost, Jim
Paperback
Product Operations: How successful companies build better products at scale
Tilles, Denise
Perri, Melissa
Paperback
Data Engineering for Cybersecurity: Build Secure Data Pipelines with Free and Open-Source Tools
Bonifield, James
Paperback
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Shields, Walter
Hardcover
High Performance Python: Practical Performant Programming for Humans
Ozsvald, Ian
Gorelick, Micha
Paperback
PR Technology, Data and Insights: Igniting a Positive Return on Your Communications Investment
Weiner, Mark
Paperback
Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future
Govindarajan, Vijay
Venkatraman, Venkat
Hardcover
Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering Problems
Konieczny, Bartosz
Paperback
SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights
Tanimura, Cathy
Paperback
How to Interpret Data: Using Data to Improve Your Influence and Decision-Making
Kelly, Nicholas
Paperback
Football Analytics with Python & R: Learning Data Science Through the Lens of Sports
Eager, Eric A.
Erickson, Richard a.
Paperback
Databricks Certified Data Engineer Associate Study Guide: In-Depth Guidance and Practice
Alhussein, Derar
Paperback
SQL Cookbook: Query Solutions and Techniques for All SQL Users
Graaf, Robert de
Molinaro, Anthony
Paperback
Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
Shapira, Gwen
Palino, Todd
Sivaram, Rajini
Paperback
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Shields, Walter
Paperback
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
Friedman, Jerome
Tibshirani, Robert
Hastie, Trevor
Hardcover
Mathletics: How Gamblers, Managers, and Fans Use Mathematics in Sports, Second Edition
Winston, Wayne L.
Nestler, Scott
Pelechrinis, Konstantinos
Paperback
Data Analytics & Visualization All-In-One for Dummies
Hyman, Jack A.
McFedries, Paul
Massaron, Luca
Paperback
Data and Reality: A Timeless Perspective on Perceiving and Managing Information in Our Imprecise World, 3rd Edition
Kent, William
Paperback
Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success
Seiner, Robert
Paperback
PostgreSQL 16 Administration Cookbook: Solve real-world Database Administration challenges with 180+ practical recipes and best practices
Ciolli, Gianni
Mejías, Boriss
Angelakos, Jimmy
Paperback
Data Governance with Unity Catalog on Databricks: Implement Data and AI Governance with Databricks Data Intelligence Platform
Sreekumar, Kiran
Subbarao, Karthik
Paperback
Data Modeling with Microsoft Power BI: Self-Service and Enterprise Data Warehouse with Power BI
Ehrenmueller-Jensen, Markus
Paperback
Omnimics: An Executive Playbook for Artificial Intelligence and Analytics
Fetherling, J. Tod
Paperback
Data Literacy in Practice: A complete guide to data literacy and making smarter decisions with data through intelligent actions
Klidas, Angelika
Hanegan, Kevin
Paperback
Numerical Python: Scientific Computing and Data Science Applications with Numpy, Scipy and Matplotlib
Johansson, Robert
Paperback
Observability Engineering: Achieving Production Excellence
Miranda, George
Majors, Charity
Fong-Jones, Liz
Paperback
Collect, Combine, and Transform Data Using Power Query in Power Bi and Excel
Maslyuk, Daniil
Raviv, Gil
Paperback
SAP S/4hana Financial Accounting Certification Guide: Application Associate Exam
Pougkas, Stefanos
Paperback
Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Shiran, Tomer
Hughes, Jason
Merced, Alex
Paperback
A Friendly Guide to Data Science: Everything You Should Know about the Hottest Field in Tech
Vincent, Kelly P.
Paperback
Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
Gutman, Alex J.
Goldmeier, Jordan
Paperback
Data Analytics Made Easy: Analyze and present data to make informed decisions without writing any code
Mauro, Andrea de
Paperback
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Riccomini, Chris
Kleppmann, Martin
Paperback
Learning Statistics with Jamovi: A Tutorial for Beginners in Statistical Analysis
Navarro, Danielle
Foxcroft, David
Paperback
Neo4j: The Definitive Guide: Hands-On Recipes for Production-Ready Graph Implementations
Willemsen, Christophe
Misquitta, Luanne
Paperback
Learning Tableau 2025 - Sixth Edition: Leverage Tableau's newest features to revolutionize your data storytelling with AI-enhanced insights
Milligan, Joshua N.
Paperback
Advanced Snowflake: Processing Data, Developing Applications, and Deploying ML Models at Scale
Ullah, Muhammad Fasih
Paperback
Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema
Corr, Lawrence
Stagnitto, Jim
Paperback
Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with UNIX Power Tools
Janssens, Jeroen
Paperback
Data Analytics with Hadoop: An Introduction for Data Scientists
Kim, Jenny
Bengfort, Benjamin
Paperback
Turning Data into Wisdom: How We Can Collaborate with Data to Change Ourselves, Our Organizations, and Even the World
Hanegan, Kevin
Paperback
SQL Server 2025 Unveiled: The Ai-Ready Enterprise Database with Microsoft Fabric Integration
Ward, Bob
Paperback
Databricks Data Intelligence Platform: Unlocking the Genai Revolution
Gupta, Nikhil
Yip, Jason
Paperback
Next-Level A/B Testing: Repeatable, Rapid, and Flexible Product Experimentation
Nassery, Leemay
Paperback
Data Governance: The Definitive Guide: People, Processes, and Tools to Operationalize Data Trustworthiness
Eryurek, Evren
Lakshmanan, Valliappa
Gilad, Uri
Paperback
PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries
Database Expert
Dombrovskaya, Henrietta
Bailliekova, Anna
Paperback
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are
Stephens-Davidowitz, Seth
Paperback
Practical Lakehouse Architecture: Designing and Implementing Modern Data Platforms at Scale
Thalpati, Gaurav Ashok
Paperback
Practical Time Series Analysis: Prediction with Statistics and Machine Learning
Nielsen, Aileen
Paperback
First-Party Data Activation: Modernize Your Marketing Data Platform
Magauova, Alina D.
Kennis, Oscar
Joosten, David H.
Paperback
Kafka for Architects: Event-Driven Architecture, Logs, Microservices, Real-Time Event Processing
Gorshkova, Katya
Paperback
High Performance PostgreSQL for Rails: Reliable, Scalable, Maintainable Database Applications
Atkinson, Andrew
Paperback
Text as Data: A New Framework for Machine Learning and the Social Sciences
Stewart, Brandon M.
Grimmer, Justin
Roberts, Margaret E.
Paperback
Analytics the Right Way: A Business Leader's Guide to Putting Data to Productive Use
Wilson, Tim
Sutherland, Joe
Paperback
Analytics Engineering with SQL and Dbt: Building Meaningful Data Models at Scale
Machado, Rui
Russa, Hélder
Paperback
Big Data in Der Mobilität: Akteure, Geschäftsmodelle Und Nutzenpotenziale Für Die Welt Von Morgen
Müller-Peters, Horst
Gatzert, Nadine
Knorre, Susanne
Paperback
Excel 2021: Everything you need to know about Excel to go from Beginner to Expert
Wright, Nora E.
Paperback
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Karau, Holden
Warren, Rachel
Paperback
Practical Natural Language Processing: A Comprehensive Guide to Building Real-World Nlp Systems
Vajjala, Sowmya
Majumder, Bodhisattwa
Gupta, Anuj
Paperback
SQL Server 2022 Query Performance Tuning: Troubleshoot and Optimize Query Performance
Fritchey, Grant
Paperback
Data Science on the Google Cloud Platform: Implementing End-To-End Real-Time Data Pipelines: From Ingest to Machine Learning
Lakshmanan, Valliappa
Paperback
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis
Teate, Renee M. P.
Paperback
Microsoft Power Bi Visual Calculations: Simplifying Dax
Ter Heerdt, Jeroen
Stikkelorum, Madzy
Lelijveld, Marc
Paperback
Blockchain: The Comprehensive Guide to Blockchain Development, Ethereum, Solidity, and Smart Contracts
Schütz, Andreas
Fertig, Tobias
Paperback
Learn PostgreSQL - Second Edition: Use, manage and build secure and scalable databases with PostgreSQL 16
Ferrari, Luca
Pirozzi, Enrico
Paperback
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Frost, Jim
Hardcover
Cassandra: The Definitive Guide, (Revised) Third Edition: Distributed Data at Web Scale
Carpenter, Jeff
Hewitt, Eben
Paperback
Ciencia de Datos: Guía completa para principiantes para aprender los reinos de la ciencia de datos
Vance, William
Paperback
Social Media Exposed: Track Disinformation, Spot Bots, and Find Out What's Really Happening
Abrahams, Alexei Sisulu
Paperback
Stream Processing with Apache Flink: Fundamentals, Implementation, and Operation of Streaming Applications
Hueske, Fabian
Kalavri, Vasiliki
Paperback
Aprende SQL en un fin de semana: El curso definitivo para crear y consultar bases de datos
Padial Solier, Antonio
Paperback
Statistical Quantitative Methods in Finance: From Theory to Quantitative Portfolio Management
Ahlawat, Samit
Paperback
Apache Polaris: The Definitive Guide: Enriching Apache Iceberg Data Lakehouses with an Open Source Catalog
Madson, Andrew
Merced, Alex
Shiran, Tomer
Paperback
AWS Certified Data Engineer Study Guide: Associate (Dea-C01) Exam
Gumbo, Chenjerai
Gatt, Adam
Humair, Syed
Paperback
Streaming Databases: Unifying Batch and Stream Processing
Dulay, Hubert
Debusmann, Ralph Matthias
Paperback
The Data Storyteller's Handbook: How to create business impact using data storytelling
Greenbrook, Kat
Paperback
The Data Management Workbook: Practical Exercises for Better Organization, Storage and Use of Your Research Data
Briney, Kristin
Paperback
Implementing Data Mesh: Design, Build, and Implement Data Contracts, Data Products, and Data Mesh
Perrin, Jean-Georges
Broda, Eric
Paperback
Pro Oracle Database 23ai Administration: Manage and Safeguard Your Organization's Data
Malcher, Michelle
Kuhn, Darl
Paperback
Mastering Access 365: An Easy Guide to Building Efficient Databases for Managing Your Data
George, Nathan
Hardcover
Financial Data Science with Python: An Integrated Approach to Analysis, Modeling, and Machine Learning
Chen, Haojun
Paperback
Data Strategy: How to Use Data and Artificial Intelligence to Transform Your Business
Marr, Bernard
Paperback
How to Interpret Data: Using Data to Improve Your Influence and Decision-Making
Kelly, Nicholas
Hardcover
Managing Data as a Product: Design and build data-product-centered socio-technical architectures
Gioia, Andrea
Paperback
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Warren, Rachel
Karau, Holden
Polak, Adi
Paperback
Dataproc Cookbook: Running Spark and Hadoop Workloads in Google Cloud
Sadineni, Narasimha
Venkataraman, Anuyogam
Paperback
Cracking the Data Science Interview: Unlock insider tips from industry experts to master the data science field
Gonzalez, Leondra R.
Stubberfield, Aaren
Paperback
Optimizing DAX: Improving DAX performance in Microsoft Power BI and Analysis Services
Russo, Marco
Ferrari, Alberto
Paperback
